How to map genomic sequence to protein functions or domains?
2
0
Entering edit mode
12 months ago
weixiaokuan ▴ 140

Hi, Is it possible to annotate a genomic sequence to protein domains/location and functions? I know several packages can annotate genomic sequence to retrieve protein id but don't know how may further annotate them for domain information and functions. Thank you.

GRange Genome Ensembldb annotation • 665 views
ADD COMMENT
0
Entering edit mode
12 months ago
dthorbur ★ 1.9k

I don't think you can just give a tool a random genomic sequence and expect it to identify protein functional domains. The number of permutations would be huge in identifying TSS, intron/exon/UTR boundaries, reading frames, etc... Even for a simple genomic sequence.

To my knowledge, the minimum input requirement would be an mRNA transcript which can be used to infer protein sequence. Then there are plenty of tools if you have transcript coordinates or mRNA sequences. TRAPID is a useful tool designed for use with RNA experiments to annotate sequences with protein domains, GO, KEGG, etc...

ADD COMMENT
0
Entering edit mode
12 months ago

If you have mRNA sequences already interproscan https://github.com/ebi-pf-team/interproscan is probably one of the classic tools for this.

https://interproscan-docs.readthedocs.io/en/latest/Introduction.html#what-is-interproscan

It is all rather well integrated and works well in my experience.

Others might be Maker/Augustus to produce mRNA or protein sequences (ie, gene prediction), then Blast2go or you can check out further decent functional annotation tools from this competition (past publications) https://biofunctionprediction.org/cafa/

ADD COMMENT

Login before adding your answer.

Traffic: 2520 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6