Question: Too many enriched GO terms using Goseq
0
gravatar for tianshenbio
4 months ago by
tianshenbio50
tianshenbio50 wrote:

I have a genome of 22129 genes and I got a list of 2905 DE genes I used Goseq to perform GO enrichment analysis but got a list of more than 500 significantly enriched GO terms (p<0.05). How can I get a manageable number of enriched GO terms? Is it because the number of DE genes is too large?

Here's how I perform the enrichment analysis (bias.data - data corrected by gene length)

pwf <- nullp(gene.data, bias.data = genes.bias.data, plot.fit = FALSE)
GO.wall <- goseq(pwf, gene2cat = gene2go_data, method = "Wallenius", use_genes_without_cat = FALSE)
rna-seq go goseq enrichment • 184 views
ADD COMMENTlink written 4 months ago by tianshenbio50
1

It is indeed probably due to your list of DEGs being large. You may also observe that the top of your enrichment list is populated by big GO terms which cover lots of genes (hundreds), because those usually have more power to be detected as enriched due to their big n. One thing you could do is to summarize your list with tools such as REViGO. You input your list of enriched GO terms (accompanied by p-value) and it will collapse redundant categories (semantically) giving you a smaller and more manageable view of the affected pathways.

(Additionally, if you have many DEGs you may filter them by fold change and retain only the biggest changes to have a smaller list)

ADD REPLYlink modified 4 months ago • written 4 months ago by Papyrus370

Thank you for your suggestion. I may consider using padj<0.001 and log2FC>1 to filter my DE genes.

ADD REPLYlink written 4 months ago by tianshenbio50
1

Keep in mind that your log2FC values will be positive and negative (for up-regulation and down-regulation, depending on how you specified the contrast). So filter for absolute values of log2FC (>1 will be 2x increase (2 FC) and <-1 will be 2x decrease (0.5 FC)).

On a side note, if you haven't tried, maybe you could separate up-regulated and down-regulated genes, and this will also result on smaller lists of DEGs which will "clean" your GO results. Nonetheless, all of this depends on your biological question at hand.

Personally, I believe that fold-change filtering is the more biologically sound choice, which is what you're probably aiming for when doing subsequent pathway enrichment analyses. Once you have the "safety" of your multiple-testing correction (your p-adj) you're OK to go, and maybe going to lower p-adj will probably bias to low-variance genes instead of genes with strong changes.

ADD REPLYlink modified 4 months ago • written 4 months ago by Papyrus370

Thank you for your answer! Actually I have played around with all these factors, it's just not that easy to decide what would be the best criteria to perform this kind of analysis...

ADD REPLYlink written 4 months ago by tianshenbio50
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1480 users visited in the last hour