Question: Generate GTF/GFF file (coordinates) from a FASTA annotated file.
0
gravatar for marquezg48
13 months ago by
marquezg480
marquezg480 wrote:

Hello everyone!

I have a Fasta file from a de novo transcriptome assembly using trinity, the assembled sequences were annotated with blastP and BlastX and now I have this kind of headers in my fasta file:

>Q86TG7|PEG10_HUMAN
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

>P11369|POL2_MOUSE
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXX

With this file I'd like to create a GFF/GTF file with coordinates, maybe by mapping this assembly to the reference genome. The problem here is that the available genome annotation is problematic and it would be easy to have a new GFF/GTF file with these data.

I'm not sure which tool can be helpful for this task. I read PASA could be useful, but didn't really understand the way to do such a thing.

Could you help me?

ADD COMMENTlink written 13 months ago by marquezg480

You would like to update an existing structural annotation based on this new data? Or you want an annotation based only on this trinity data?

You can use PASA Indeed, GAWN or MAKER. What is your problem with PASA? You have to provide the trinity fasta file, the genome and a list of alignment tool to use (GMAP and/or Blat)

ADD REPLYlink written 13 months ago by Juke344.8k

Start by a mapping using GMAP - it's good - with gff3 output.

PASA is more difficult, but also very good.

Maker is pretty simple, but benefits from multiple evidence sources.

Always thinks about gene names and versioning with your annotation approaches, as people using your annotation get annoyed when things change or "disappear" with updates.

ADD REPLYlink written 13 months ago by colindaven2.4k

Exactly what you're asking in your second question. I'd like to generate a new annotation based only on the fasta file that I have. My problem with PASA is that, as I've never done this kind of bioinformatic tasks, I'm not sure what I need to do it. Now I get that I need the alignment tool (as colindaven and you have said I could use GMAP) or file (if I use Maker), the fasta file and my genome. Do you know if there's a pipeline for dummies (me)? I only found this one https://github.com/PASApipeline/PASApipeline/wiki/PASA_comprehensive_db

Not sure if is the best option.

Thanks and greetings!

ADD REPLYlink written 13 months ago by marquezg480
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1205 users visited in the last hour