Question: Functional Annotations After GeneMark-ES
9 months ago
brittanymlebert10 wrote:

This may be a stupid question, but I am very new to bioinformatics. I am trying to annotate a novel fungal genome. I just ran it through GeneMark-ES to annotate it and got a .gtf output. When I put this file into Geneious it attaches to my FASTA file and shows me all of the introns and exons on the sequences. My question is, how do I go from this to getting functional annotations? I downloaded Blast2Go and it looks like it only needs the FASTA file to run. If this is true, then why did I need to generate the .gtf file? How can it give me functional annotations without it? Thank you so much in advance!

written 9 months ago by brittanymlebert10
9 months ago
lieven.sterck9.4k wrote:

you might be mixing up a few (genome annotation) concepts here.

the fasta files hold the actual sequences of either your genome (== your initial input fasta file for genemark for instance) and on the other hand CDS, proteins, .... and all other possible sequences.

The annotation process of GeneMark is there to provide you with a location of genes on your genomic sequence. These are typically provided in formats such as GFF, EMBl, GTF ... . These only (or should in theory) contain coordinates of features how to are present on your genomic sequence. (== also called structural annotation)

By combining both of them you can extract from your genomic sequence the actual sequences of genes/CDS which then can be translated into proteins.

Tools for functional annotation (such as Blast2GO) use the protein sequences you predicted to analyse and assign potential functions to them (== functional annotation)

written 9 months ago by lieven.sterck9.4k

Thank you for your response! I see what you mean about the combination of the fasta and the .gtf, as there are now automatic translations on Geneious under my sequences. What I'm still confused about it what file I put into Blast2Go, because it is just asking for a fasta file of actual sequences. Where would the protein sequences come into play if it just needs a fasta file to run?

written 9 months ago by brittanymlebert10

blast2GO needs a fasta file with protein sequence in it.

fasta files can contain any kind of sequence, not only DNA sequences. So you need to generate a fasta file with the protein translations of your predicted genes and put those through blast2GO .

written 9 months ago by lieven.sterck9.4k

Thank you very much! I understand now. I have been researching a simple way to just get a fasta of the protein translations from a genomic sequence fasta and a .gtf file. Do you have any suggestions?

written 9 months ago by brittanymlebert10

what did you found so far?

bedtools getfasta must have passed the search results, no?

written 9 months ago by lieven.sterck9.4k

I actually found a way to just download the translated sequences from Geneious, which was really easy. Thank you for all of your help. I really appreciate it!

written 9 months ago by brittanymlebert10
