Question: Best strategy to get GO terms for a proteome?
gravatar for biotech
5.7 years ago by
United States
biotech540 wrote:


I would like to know the best strategy to get the highest amount of GO terms for the bacterial proteome I'm working. Since it's a non-model organism, I will build the GO database from scratch.

I obtained 60% GO annotated proteome BLASTing to bacterial nr protein database (retaining first 20 hits), but some of them are very general. Same results were obtaining with BLAST2GO InterPro mapping.

I've been thinking to BLAST against uniprot and nr databases and merge results. Also, I would like to know how many hits should I retain from BLAST searches.

Thanks, Bernardo


P.S. I've also posted this question on Seqanswers forum.

uniprot blast blast2go go nr • 2.0k views
ADD COMMENTlink modified 12 weeks ago by predeus1.3k • written 5.7 years ago by biotech540
gravatar for predeus
12 weeks ago by
predeus1.3k wrote:

In case somebody is searching this: newest version of InterProScan works very nicely and adds GO terms as well as Reactome pathways to the annotation based on discovered domains. Of course, these approaches have a bunch of limitations, but still - I think this is the easiest way to do it. Took me about 8 hours on 64 cores for 38,000 proteins. -i <protein_fasta> -f tsv -b <output> -goterms -cpu 64 -etra -pa

ADD COMMENTlink written 12 weeks ago by predeus1.3k
gravatar for pld
5.7 years ago by
United States
pld4.8k wrote:

You should only be retaining one hit per subject species and these hits should be verified through reciprocal blast. Multiple hits in a single species for a given gene of interest doesn't make sense. Its equivalent to saying your gene of interest does all of the functions of the n genes in the subject species.

You can merge results, but I imagine that you'll probably get a great deal of duplicates. I'm not very familiar with BLAST2GO, but RefSeq doesn't naturally have GO annotation, so B2G must be pulling those from somewhere else.

Having many generic results after GO annotation is common, at least in my experience, if a given gene's orthologs are poorly annotated, you can't do any better.


ADD COMMENTlink written 5.7 years ago by pld4.8k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2022 users visited in the last hour