What to choose as background genes in GO enrichment analysis
1
0
Entering edit mode
3.9 years ago
tianshenbio ▴ 170

I tried both clusterprofiler and goseq for GO enrichment analysis. However, I got 4-5 times more GO terms enriched in goseq than in clusterprofiler since p-values calculated in goseq are generally smaller. This is because I used all genes in the genome as background genes so the background ratio is small and p is also smaller. Then what genes I should use as background genes? here are some of my options:

  1. All genes in the genome
  2. All genes with GO terms in the genome
  3. Genes detected in my samples
  4. Genes detected in my samples with GO terms
RNA-Seq go enrichment goseq clusterprofiler • 3.8k views
ADD COMMENT
2
Entering edit mode

I personally use the genes that were actually used in the DE analysis, so in my case all genes that survive the FilerByExpr step of edgeR.

ADD REPLY
1
Entering edit mode

I would use 3 or 4 depending on your input list, if it contains only genes with GO term then 4, if not (which is the correct way IMHO) then 3.

ADD REPLY
3
Entering edit mode
3.9 years ago
Papyrus ★ 2.9k

I would use all of the genes that were analyzed in your experiment: these are those on which you performed the differential expression analyses (you maybe previously filtered them to remove low-expression genes, etc.), because (under assumption of independence) those are the ones which had a chance of appearing as DEGs.

Regarding filtering out genes which do not map to GO terms, you can control for this at the GO enrichment step: the goseq function can control this with its use_genes_without_cat argument. And by default (most recent version) these genes are ignored in the enrichment testing.

ADD COMMENT
0
Entering edit mode

Thank you for your answer! Very helpful

ADD REPLY

Login before adding your answer.

Traffic: 1824 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6