Hello everyone!
I have a Fasta file from a de novo transcriptome assembly using trinity, the assembled sequences were annotated with blastP and BlastX and now I have this kind of headers in my fasta file:
>Q86TG7|PEG10_HUMAN
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
>P11369|POL2_MOUSE
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXX
With this file I'd like to create a GFF/GTF file with coordinates, maybe by mapping this assembly to the reference genome. The problem here is that the available genome annotation is problematic and it would be easy to have a new GFF/GTF file with these data.
I'm not sure which tool can be helpful for this task. I read PASA could be useful, but didn't really understand the way to do such a thing.
Could you help me?
You would like to update an existing structural annotation based on this new data? Or you want an annotation based only on this trinity data?
You can use PASA Indeed, GAWN or MAKER. What is your problem with PASA? You have to provide the trinity fasta file, the genome and a list of alignment tool to use (GMAP and/or Blat)
Start by a mapping using GMAP - it's good - with gff3 output.
PASA is more difficult, but also very good.
Maker is pretty simple, but benefits from multiple evidence sources.
Always thinks about gene names and versioning with your annotation approaches, as people using your annotation get annoyed when things change or "disappear" with updates.
Exactly what you're asking in your second question. I'd like to generate a new annotation based only on the fasta file that I have. My problem with PASA is that, as I've never done this kind of bioinformatic tasks, I'm not sure what I need to do it. Now I get that I need the alignment tool (as colindaven and you have said I could use GMAP) or file (if I use Maker), the fasta file and my genome. Do you know if there's a pipeline for dummies (me)? I only found this one https://github.com/PASApipeline/PASApipeline/wiki/PASA_comprehensive_db
Not sure if is the best option.
Thanks and greetings!