Question: Simple Go Analysis For Gene Expression Microarrays
gravatar for pld
6.7 years ago by
United States
pld4.8k wrote:

I was wondering if anyone has suggestions for fairly simple GO term enrichment that would benefit a survivor/non-survivor study. Currently I have just been comparing the top n most represented terms for the top n most differentially expressed genes. E.g., In the survivors these are the top n terms, and in the non-survivors these are the top n terms, filtering for common terms. I've also ranked terms by the average change in expression, between up and down regulated genes.

Would it be sufficient to continue with the current approach, or would it be worth exploring other venues (if so what would be some reasonable approaches). I have played around with DAVID to a degree, but the goal behind introducing GO was to add coarse-grained support to pathway analysis. DAVID results would warrant a more in dept followup and I want to avoid project creep.

UPDATE: The arrays came from another study where the RNA used was collected only from experimental animals, not all can be assumed to have been exposed to the same experimental conditions. Animals considered non-suvivors here were those that met endpoint critera before the end of the study. Survivors were considered to be from those that had not yet reached endpoint criteria when the study period ended. The RNA was collected from liver tissue. This means that the samples are not time matched and there is no data for the control animals. The goal is to attempt to gain insight into the differences between two experimental classes.

Differentially expressed here would mean a gene that has significant (p <= 0.05) FC of at least 2.0 between the non-survivors and survivors. The arrays were processed with limma using background subtraction and normalized with VSN.

GO term assignment has been established by mining for orthologous proteins through reciprocal blast againt human and mouse proteins. Mouse and human GO terms were downloaded directly from the GOC website to obtain the most recent versions. Meaning a gene in the model animal will only have GO annotations if there is at least one gene in either humans or mice that has a sufficient RBH. This also limits the analysis to some subset of the available probes on the arrays. Human and mouse derived annotations are being considered individually, meaning I plan to duplicate any GO enrichment I perform, once with the mouse set and once with the human set.

All of this is in a MySQL db, so it is flexible.

The idea is to use these annotations to provide a basic functional characterization of the differences between the two experimental groups.

ADD COMMENTlink modified 6.7 years ago • written 6.7 years ago by pld4.8k

Can you describe your study design a bit more? You have a survivor/non-survivor study, but you talk about two sets of genes. I would have thought that two groups would give only one set of genes.

ADD REPLYlink written 6.7 years ago by Sean Davis26k

I added a better description of the situation.

ADD REPLYlink written 6.7 years ago by pld4.8k
gravatar for paola
6.7 years ago by
paola0 wrote:

Dear Joe,

As the previous comment says, it would help if you could please detail a bit further your experimental design and the analysis you've done so far to obtain your set(s?) of genes. This said, I'd have three general comments:

1) The version of GO that DAVID uses dates back to 2009 ( Many GO terms and annotations have been added in these past 4 years, especially in areas of biomedical interest that may be potentially relevant to your study: cardiac and kidney functioning, apoptosis, signaling... So you may miss quite a bit by limiting yourself to DAVID. For your reference, you may download the most updated version of ontology and annotations from the GO website (, though not all third-party tools will allow you to use different versions than their default one. You may want to check the version date used by the tool, if that information is available.

2) A GO term enrichment tool developed by the GO Consortium, and using the most recent versions of ontology and annotations, is available on AmiGO2 ( Click on 'Tools and Resources' at the top, then scroll down to 'Analysis of GO data'. Feel free to contact the GO helpdesk if you have any question about this tool (

3) In my personal experience, statistical analysis of microarray data from human patients is hampered by the inherent variability in genetic background. There is often also a redundancy in genes involved in pathways - so factor A may be affected in non-survivor A and factor B affected in patient B, you wouldn't see that easily in a classical statistical test but the end result (pathway affected) might be the same. You may want to test Rank Product analysis and/or Gene Sets Enrichment Analysis.

Hope this helps, best regards,

Paola Roncaglia (Gene Ontology Consortium)

ADD COMMENTlink written 6.7 years ago by paola0

I've updated the post a bit.

ADD REPLYlink written 6.7 years ago by pld4.8k

Dear Joe,

Thanks for providing further details. Based on those, I would still suggest to consider the points in my previous comment, even if the data come from mice instead of human. As for the statistical test you describe, I'm not sure what amount of differentially expressed genes you retrieved, but you might want to relax your parameters a bit, e.g. decrease the fold-change threshold. It is difficult to perform an enrichment analysis with a small number of genes. There is no 'absolute number' really, but you may do a few tests. Other people on this forum may provide further tips, but feel free to contact us at GO if you have any question on the ontology or on using tools provided by GO.

Best wishes, Paola

ADD REPLYlink written 6.7 years ago by paola0
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 834 users visited in the last hour