Question: Software for GO enrichment based on own DEGs and GO annotation.
1
gravatar for wangdp123
18 months ago by
wangdp123140
Oxford
wangdp123140 wrote:

Hi there,

I have assigned all GO terms to each gene in my study through interproscan, and I am going to do some GO enrichment analysis based on my DEGs and my GO annotation database.

Could anybody suggest some decent software for this purpose?

Many thanks,

Tom

go enrichment • 1.3k views
ADD COMMENTlink modified 18 months ago by mforde841.2k • written 18 months ago by wangdp123140

Do you have a model organism or is it a neglected species?

There are many tools available, for both microarrays or RNAseq (is not clear from your question).

For instance topGO or goseq.

ADD REPLYlink written 18 months ago by b.nota5.4k

Hi, it is not a model organism.

I have annotated the GO terms for each gene in this species.

What I need to do is to input my GO assignment rather than use the public annotated one to do enrichment analysis with my identified up-regulated or down-regulated gene lists.

ADD REPLYlink written 18 months ago by wangdp123140
1

Both GOseq and topGO can handle custom GO sets, but you need to put a bit of afford in it yourself. They describe in their manual how to use non-model organism data with their tools.

ADD REPLYlink written 18 months ago by b.nota5.4k
1
gravatar for EagleEye
18 months ago by
EagleEye6.0k
Sweden
EagleEye6.0k wrote:

Gene Set Clustering based on Functional annotation (GeneSCF)

For custom annotation (non-model) use GeneSCF v1.0

A: GO enrichment analysis using a Text file with all the genes and GO ids associat

ADD COMMENTlink modified 18 months ago • written 18 months ago by EagleEye6.0k
0
gravatar for h.mon
18 months ago by
h.mon21k
Brazil
h.mon21k wrote:

I don't know exactly where you are stuck at, but several programs (for example, GSEA or GAGE) can read a custom-made gmt file and perform enrichment analysis.

ADD COMMENTlink written 18 months ago by h.mon21k

That's pathway enrichment which is different from gene ontology term enrichment. Loosly pathway analysis looks at centroid expression level of multiple genes within a given pathway then determines the probability that this is different from some group contrast. Whereas term enrichment, looks at the expected frequency of any given GO term drawn from a random sampling of background genes compared to the proportion observed from a presumably non-random selection of genes. They mean different things. For example, you'll routinely get alot of enriched terms which don't show corresponding pathway enrichments mainly because alot of terms are either uber common or uber rare. For example, lets say you're study cardiac cells, you'd expect to see alot of enrichment for neuronal go terms because there's alot of overlapping signaling cascades between the two tissue types. E.g., recently we did a go analysis on colorectal liver cancer mets, and the most highly enriched GO term was photoreceptor signaling because we had alot of RAS and PI3K activity in our samples. Now I don't know about you, but I honestly don't think your liver is growing an eyeball even if it has cancer.

ADD REPLYlink modified 18 months ago • written 18 months ago by mforde841.2k
1

Hi mforde84,

Honestly I did not get your single sentence (point) from your whole comment. At first I thought you answered in the wrong thread.

I am sure that what h.mon and myself answered is relavent to the question asked here.

ADD REPLYlink written 18 months ago by EagleEye6.0k

Pathway analysis or gene set enrichment is not the same as Gene Ontology term enrichment / over-representation. You're measure two completely different things.

Sure you can use GO terms and their gene constituents as pathways and ask interesting questions about whether those pathways are being constitutively over expressed or down regulated compared to some baseline.

But what OP is saying is that he has a list of DEG which he has mapped to GO annotations. Now he wants to know if the observed proportion of mapped GO terms with a given ontology is greater than the proportion generated using the entire population of genes from whole transcriptome.

Eg, say I have 10 DEG. After mapping to GO terms, I find all 10 genes have mappings to "cAMP Signal Transduction" out of 100 total GO term mappings for those genes. This gives us a proportion of 10/100. Now let's just imagine we looked at all the genes in the transcriptome and asked how many map to "cAMP Signal Transduction"? Let's say we find 1,000 mappings out of a total 1,000,000,000 possible GO terms. This gives us a proportion of 1,000:1,000,000,000. So the statistical question that OP wants to answer:

is the proportion 10:100 significantly different from 1,000:1,000,000,000?

Another way we could think of it from a more parametric testing standpoint is if we randomly drew 10 genes from whole transcriptome any sufficiently large N times, the proportion of ontologies for "cAMP...blahblahblah" would average ~1 mapping per 1,000,000 total mappings. If your selected gene list has a proportion greater or less than this value, then that means it's probably not a random selection of genes

ADD REPLYlink modified 18 months ago • written 18 months ago by mforde841.2k

Both programs I cited perform what is usually called a "gene set analysis enrichment", or GSEA. GSEA can be used to evaluate enrichment of any gene set, such as metabolic pathways or GO categories.

Being able to use GO categories as gene sets, I think one can consider this as a "term enrichment".

GO enrichment analysis based on my DEGs

This quote hinted the OP wants to use "over-representation analysis", or ORA, to investigate for GO term enrichment. ORA and GSEA are asking the same question, but GSEA is generally considered better for a number of reasons, for me the most important being it is more sensitive. Read a review here - but in this review ORA is called "class I" or "singular enrichment analysis".

ADD REPLYlink written 18 months ago by h.mon21k

It sounds like OP wants to do a Fisher's or hyper geometric over-representation test for GO terms filtered by DEG. GSEA is different, though as you say it's preferable for a number of reasons.

ADD REPLYlink written 18 months ago by mforde841.2k
0
gravatar for mforde84
18 months ago by
mforde841.2k
mforde841.2k wrote:

Try http://geneontology.org/page/go-enrichment-analysis

A popular option in R is to use limma goana() function as well.

ADD COMMENTlink written 18 months ago by mforde841.2k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1445 users visited in the last hour