Generate GTF/GFF file (coordinates) from a FASTA annotated file.
0
0
Entering edit mode
4.8 years ago
marquezg48 • 0

Hello everyone!

I have a Fasta file from a de novo transcriptome assembly using trinity, the assembled sequences were annotated with blastP and BlastX and now I have this kind of headers in my fasta file:

>Q86TG7|PEG10_HUMAN            
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

>P11369|POL2_MOUSE                 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX              
XXXXXXXXXXXXX

With this file I'd like to create a GFF/GTF file with coordinates, maybe by mapping this assembly to the reference genome. The problem here is that the available genome annotation is problematic and it would be easy to have a new GFF/GTF file with these data.

I'm not sure which tool can be helpful for this task. I read PASA could be useful, but didn't really understand the way to do such a thing.

Could you help me?

Assembly genome • 2.9k views
ADD COMMENT
0
Entering edit mode

You would like to update an existing structural annotation based on this new data? Or you want an annotation based only on this trinity data?

You can use PASA Indeed, GAWN or MAKER. What is your problem with PASA? You have to provide the trinity fasta file, the genome and a list of alignment tool to use (GMAP and/or Blat)

ADD REPLY
0
Entering edit mode

Start by a mapping using GMAP - it's good - with gff3 output.

PASA is more difficult, but also very good.

Maker is pretty simple, but benefits from multiple evidence sources.

Always thinks about gene names and versioning with your annotation approaches, as people using your annotation get annoyed when things change or "disappear" with updates.

ADD REPLY
0
Entering edit mode

Exactly what you're asking in your second question. I'd like to generate a new annotation based only on the fasta file that I have. My problem with PASA is that, as I've never done this kind of bioinformatic tasks, I'm not sure what I need to do it. Now I get that I need the alignment tool (as colindaven and you have said I could use GMAP) or file (if I use Maker), the fasta file and my genome. Do you know if there's a pipeline for dummies (me)? I only found this one https://github.com/PASApipeline/PASApipeline/wiki/PASA_comprehensive_db

Not sure if is the best option.

Thanks and greetings!

ADD REPLY

Login before adding your answer.

Traffic: 1253 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6