Software for GO enrichment based on own DEGs and GO annotation.
3
1
Entering edit mode
6.9 years ago
wangdp123 ▴ 340

Hi there,

I have assigned all GO terms to each gene in my study through interproscan, and I am going to do some GO enrichment analysis based on my DEGs and my GO annotation database.

Could anybody suggest some decent software for this purpose?

Many thanks,

Tom

GO enrichment • 3.9k views
ADD COMMENT
0
Entering edit mode

Do you have a model organism or is it a neglected species?

There are many tools available, for both microarrays or RNAseq (is not clear from your question).

For instance topGO or goseq.

ADD REPLY
0
Entering edit mode

Hi, it is not a model organism.

I have annotated the GO terms for each gene in this species.

What I need to do is to input my GO assignment rather than use the public annotated one to do enrichment analysis with my identified up-regulated or down-regulated gene lists.

ADD REPLY
1
Entering edit mode

Both GOseq and topGO can handle custom GO sets, but you need to put a bit of afford in it yourself. They describe in their manual how to use non-model organism data with their tools.

ADD REPLY
1
Entering edit mode
ADD COMMENT
0
Entering edit mode
6.9 years ago
h.mon 35k

I don't know exactly where you are stuck at, but several programs (for example, GSEA or GAGE) can read a custom-made gmt file and perform enrichment analysis.

ADD COMMENT
0
Entering edit mode

That's pathway enrichment which is different from gene ontology term enrichment. Loosly pathway analysis looks at centroid expression level of multiple genes within a given pathway then determines the probability that this is different from some group contrast. Whereas term enrichment, looks at the expected frequency of any given GO term drawn from a random sampling of background genes compared to the proportion observed from a presumably non-random selection of genes. They mean different things. For example, you'll routinely get alot of enriched terms which don't show corresponding pathway enrichments mainly because alot of terms are either uber common or uber rare. For example, lets say you're study cardiac cells, you'd expect to see alot of enrichment for neuronal go terms because there's alot of overlapping signaling cascades between the two tissue types. E.g., recently we did a go analysis on colorectal liver cancer mets, and the most highly enriched GO term was photoreceptor signaling because we had alot of RAS and PI3K activity in our samples. Now I don't know about you, but I honestly don't think your liver is growing an eyeball even if it has cancer.

ADD REPLY
1
Entering edit mode

Hi mforde84,

Honestly I did not get your single sentence (point) from your whole comment. At first I thought you answered in the wrong thread.

I am sure that what h.mon and myself answered is relavent to the question asked here.

ADD REPLY
0
Entering edit mode

Pathway analysis or gene set enrichment is not the same as Gene Ontology term enrichment / over-representation. You're measure two completely different things.

Sure you can use GO terms and their gene constituents as pathways and ask interesting questions about whether those pathways are being constitutively over expressed or down regulated compared to some baseline.

But what OP is saying is that he has a list of DEG which he has mapped to GO annotations. Now he wants to know if the observed proportion of mapped GO terms with a given ontology is greater than the proportion generated using the entire population of genes from whole transcriptome.

Eg, say I have 10 DEG. After mapping to GO terms, I find all 10 genes have mappings to "cAMP Signal Transduction" out of 100 total GO term mappings for those genes. This gives us a proportion of 10/100. Now let's just imagine we looked at all the genes in the transcriptome and asked how many map to "cAMP Signal Transduction"? Let's say we find 1,000 mappings out of a total 1,000,000,000 possible GO terms. This gives us a proportion of 1,000:1,000,000,000. So the statistical question that OP wants to answer:

is the proportion 10:100 significantly different from 1,000:1,000,000,000?

Another way we could think of it from a more parametric testing standpoint is if we randomly drew 10 genes from whole transcriptome any sufficiently large N times, the proportion of ontologies for "cAMP...blahblahblah" would average ~1 mapping per 1,000,000 total mappings. If your selected gene list has a proportion greater or less than this value, then that means it's probably not a random selection of genes

ADD REPLY
0
Entering edit mode

Both programs I cited perform what is usually called a "gene set analysis enrichment", or GSEA. GSEA can be used to evaluate enrichment of any gene set, such as metabolic pathways or GO categories.

Being able to use GO categories as gene sets, I think one can consider this as a "term enrichment".

GO enrichment analysis based on my DEGs

This quote hinted the OP wants to use "over-representation analysis", or ORA, to investigate for GO term enrichment. ORA and GSEA are asking the same question, but GSEA is generally considered better for a number of reasons, for me the most important being it is more sensitive. Read a review here - but in this review ORA is called "class I" or "singular enrichment analysis".

ADD REPLY
0
Entering edit mode

It sounds like OP wants to do a Fisher's or hyper geometric over-representation test for GO terms filtered by DEG. GSEA is different, though as you say it's preferable for a number of reasons.

ADD REPLY
0
Entering edit mode
6.9 years ago
mforde84 ★ 1.4k

Try http://geneontology.org/page/go-enrichment-analysis

A popular option in R is to use limma goana() function as well.

ADD COMMENT

Login before adding your answer.

Traffic: 1864 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6