Question: How to find GO file and do GO enrichment analysis ?
gravatar for jack
4.2 years ago by
jack750 wrote:

I have list of differentially expressed gene for paramecium tetraurelia. I want to do gene ontology enrichment analysis. there are two problem:

1) I couldn't find the GO annotation for Paramecium.

2) Given that I found the GO annotation for this organism, which tool is the best to do GO enrichment analysis?

I have seen in one paper, which they have mentioned that "We conducted a domain search of the P. bursaria transcripts against the Pfam database release 26.0. Gene ontology (GO) terms were assigned to each transcript using the pfam2go conversion table" but it's not clear for me how.

Can somebody help me with this ?

ADD COMMENTlink modified 4.2 years ago by chemcehn180 • written 4.2 years ago by jack750
gravatar for pld
4.2 years ago by
United States
pld4.8k wrote:

Pfam is a database of protein families. Specifically, using HMMER they create hidden markov models that represent a conserved group of proteins (a family). Now, when proteins are conserved we assume there is functional similarity. This is a general assumption and can be impacted in sequence and species specific ways, but in general it works.

So if you can establish that a conserved group of proteins (a family) shares some set of functions, you can assume that any member of that family should also have that function. So if these hold, predicting function becomes a problem of predicting which families a protein may belong to. This is what the authors did, they knew the functions of the families so to infer the potential functions of their proteins they had to find the families they may belong to.

As for the authors data, when they do these forms of annotation in general you should be able to find it either in the supplemental information or in some cases by contacting the author. Always check the supplement in these kinds of papers. For the paper I assume you're referring to the information is in the supplement: (see additional file 3).

Now, as for predicting function through homology all methods take the same general form but there are important distinctions. In general the idea is to infer function through finding which "thing" with known function matches your "thing" of unknown function. The two more common ways are through BLAST or HMMER/Pfam. The idea is the same, in BLAST you assign functions through specific sequences (BLAST hits) and the other through protein families (as described above).

However, there are important differences. In BLAST you usually infer function through a single best hit. This means your unknown is assigned all of the functions that specific protein has. When using Pfam, you assign all of the functions for all of the significantly high scoring Pfam hits. This seems trivial, but it can be important. Pfam simply looks at the functions that proteins in that family share, using BLAST you get functions that are known for that protein in that specific species.

The key difference is "in that specific species", you may see contextual information specific to the species of the known protein.  The kicker is that it can be hard to tell if these "extra" terms are because that protein may do something unique in its host species, or there may be better/more complete annotations for that species. Very few species have concerted efforts to annotate their genomes with GO terms (

If your species isn't close phylogenetically to the species with GO annotation efforts, I would use both approaches. BLAST your genes against say UniProt and collect GO terms through the best BLAST hit of each predicted peptide. I would also run HMMER on the predicted peptides and infer functions that way.

Blast2Go is an option, but it is massively slow if you don't buy the full version. It'll take months to annotate a large set of genes/proteins. There are other tools available as previously mentioned, see if those can help. 

If you have any programming/database experience, you can easily write a few scripts to handle this. I prefer this approach, it is easier to integrate into other forms of analysis (either on the transcriptome/etc or later analysis).

Or, just use what someone else already did! The data you want is right there in the publication!

ADD COMMENTlink modified 4.2 years ago • written 4.2 years ago by pld4.8k

Hi Joe, Regarding your very informative comment, I would like to ask for some points: 1) Does Blast2Go use both BLAST and HMMer/Pfam approaches? 2) Currently, are there tools other than Blast2Go that perform this task efficiently?

Thank you very much in advance! Phuong.

ADD REPLYlink written 3.1 years ago by pbigbig190
gravatar for dago
4.2 years ago by
dago2.5k wrote:

You could download the proteome of P. is available on embl. Otherwise, you could annotate your protein with blast2GO.

Blast2go has also a function for GO enrichment. Otherwise you could use other tools that are listed in this post:

C: Gene Ontology Enrichment Of Non-Model Bacterial Genome


ADD COMMENTlink modified 3.1 years ago • written 4.2 years ago by dago2.5k
gravatar for chemcehn
4.2 years ago by
chemcehn180 wrote:

Did you try DAVID?

ADD COMMENTlink modified 4.2 years ago • written 4.2 years ago by chemcehn180
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1102 users visited in the last hour