Question: How Much Do You Trust Geneontology Annotations?
9
gravatar for Giovanni M Dall'Olio
8.5 years ago by
London, UK
Giovanni M Dall'Olio26k wrote:

GeneOntology is a nice project to provide a standard terminology for genes and gene functions, to help avoid the use of synonyms and wrong spelling when describing a gene.

I have been using the GeneOntology for a while, but honestly I think that it contains many errors and that many terms have not enough terms associated. Moreover, the terminology they use is not always clear and there are some duplications.

It is frequent to read in article or in slideshows charts were the GO classification is used to infer the properties of a set of genes... But I wonder if the authors check the GO annotations they use.

What is your experience about GO?

gene function subjective • 3.5k views
ADD COMMENTlink modified 7.0 years ago by Charles B.160 • written 8.5 years ago by Giovanni M Dall'Olio26k
10
gravatar for Charles B.
7.6 years ago by
Charles B.160
Rennes
Charles B.160 wrote:

First, sorry if my English is not good!

many terms have not enough terms associated

I presume that you want to say many [genes] have not... There is 2 things to take into account:

  • GO uses the True Path Rule, that is to say, if a gene is annotated by a term, it is also implicitly annotated by all the parents of this term, up to the root. Making this extension is crucial in term of inference (Seung Yon Rhee, Valerie Wood, Kara Dolinski, and Sorin Draghici. Use and misuse of the gene ontology annotations. Nature Reviews Genetics, 9(7):509-515, 2008, http://bio.lmu.de/~parsch/evogen/GOreview2008.pdf).

  • All species and all metabolisms are not equal in term of annotation, the more a gene is studied, the more annotations it got.

there are some duplications

I asked GO for this, they answer me that each duplicated annotation has a different Evidence Code. It shows various level of study. So if you use GO to do some semantic enrichment or inference, think to delete all doubles. But if you are interested in Evidence Codes, doubles may serve you.

Evidence Codes represent a delicate point in Gene Ontology. I cite GO documentation: "Evidence codes are not statements of the quality of the annotation. Within each evidence code classification, some methods produce annotations of higher confidence or greater specificity than other methods, in addition the way in which a technique has been applied or interpreted in a paper will also affect the quality of the resulting annotation. Thus evidence codes cannot be used as a measure of the quality of the annotation."

So it bring an information, but it may not serve to quantify the quality of an annotation. It is a matter of higher confidence or greater specificity... The nuance is subtle.

I really recommend to read Rhee's article that I cite before for a better use of GO.

ADD COMMENTlink written 7.6 years ago by Charles B.160
9
gravatar for Jason
8.5 years ago by
Jason840
United States
Jason840 wrote:

In my experience it's case by case. In other words just because you are getting significant p-values, does not mean the results are biologically significant. I once submitted clusters of microarray data and received a bunch of hits that were significant by p-value, but really didn't have a theme. The GO terms I saw were from many different processes without an overall term (besides biological process) which linked them together. When I've looked at published GO terms searches I generally see a strong theme among many of the terms (however that doesn't necessarily mean it has biological significance until tested empirically). So seeing themes among your terms may suggest higher significance, but it should make biological sense too.

ADD COMMENTlink written 8.5 years ago by Jason840
8
gravatar for Madelaine Gogol
8.3 years ago by
Madelaine Gogol5.0k
Kansas City
Madelaine Gogol5.0k wrote:

If you find an error or have a suggestion for a go term, you can submit it on sourceforge.

You also may find useful this chart listing evidence codes and how the annotators came to them.

If you're wary of the results, you could try only using certain conservative evidence codes. Just like any biological database, it's a work in progress, but I think a lot of people have found it quite useful.

ADD COMMENTlink written 8.3 years ago by Madelaine Gogol5.0k

thank you for the answer. I did it already and I can assure you they are very keen to respond.

ADD REPLYlink written 8.3 years ago by Giovanni M Dall'Olio26k
5
gravatar for Nathan Harmston
8.3 years ago by
Nathan Harmston1.1k
London
Nathan Harmston1.1k wrote:

While I think that GO is very useful especially for exploratory data analysis. It does have a number of problems (its weird graph structure for one). I found that it was very useful for generating a feel for my data and what was going on (in the case of gsea of diff. exp GX data).

I read this paper a while ago:

Quantifying the biological signiļ¬cance of gene ontology biological processesā€”implications for the analysis of systems-wide data

http://www.ncbi.nlm.nih.gov/pubmed/19965879

and although I have a few minor issues with some of the methods, it has a very clear message in that the gene ontology does contain some terms and relationships which are artifacts of human annotations and should be removed prior to analysis (depending of course on the analysis) otherwise they will bias any statistics/conclusions.

Annotation bias of GO terms is problematic .... but unfortunately thats the way it works.

ADD COMMENTlink written 8.3 years ago by Nathan Harmston1.1k
2
gravatar for Thaman
7.6 years ago by
Thaman3.2k
Finland
Thaman3.2k wrote:

Statistically data can be qunatitative or qualitative and like Istan said it's more subjective which mean qualitative in GO terms. The P-value test is just measuring the extremity of the given test. So, we can't say GO terms to be right or wrong, instead we can rely it's accuracy.

ADD COMMENTlink modified 7.6 years ago • written 7.6 years ago by Thaman3.2k

I assure you that some terms can be simply wrong, as there are mistakes as in any other dataset. The good thing about GO is that you can look at their bug tracker on sf and see that. For example, once I found a wrong association between one gene and its localization GO.

ADD REPLYlink written 7.6 years ago by Giovanni M Dall'Olio26k
1
gravatar for Istvan Albert
8.5 years ago by
Istvan Albert ♦♦ 77k
University Park, USA
Istvan Albert ♦♦ 77k wrote:

The GO terms and classifications are primarily an based on opinions and a human interpretation of a small group of people of what the current state of the knowledge is.Thus are more subjective than say experimental measurements would be.

In fact it is surprising that it works at all; and it does indeed. We just need to becareful not too read to much into it.

ADD COMMENTlink written 8.5 years ago by Istvan Albert ♦♦ 77k

it works because they are very active at developing it and the reviewing process is very transparent. If you look at their bug tracker on sf, they have a lot of discussion there.

ADD REPLYlink written 8.3 years ago by Giovanni M Dall'Olio26k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1012 users visited in the last hour