Question

Over-representation Gene Ontology Analysis on Subset of DE Genes

0

Entering edit mode

3.0 years ago

lucia.liu54 • 0

I have run a likelihood ratio test on a three-condition comparison and received a very large number of significantly differentially expressed genes (>8,000). After performing over-representation analysis on all DE genes, no GO terms were significantly over-represented. I'm assuming this is because the significant list is about half of the background, or all genes tested for differential expression. Would it be incorrect statistical analysis to subset the top results (say, the top 1,000 DE genes by adjusted p-value) and perform over-representation analysis on that subset? It seems incorrect to take only a portion of significant results, but as a student with limited statistical knowledge I wanted to check.

GO LRT • 1.1k views

ADD COMMENT • link updated 3.0 years ago by jared.andrews07 ★ 16k • written 3.0 years ago by lucia.liu54 • 0

score 2 · Answer 1 · 2021-05-06

2

Entering edit mode

3.0 years ago

jared.andrews07 ★ 16k

Cases like this are where GSEA really shines, as it takes all genes into account and doesn't require you to manually specify which genes are differentially expressed - only to rank them (for which logFC * -log10(pvalue) works rather well).

Alternatively, you can be more stringent during DEG calling by using an lfcThreshold (DESeq2) or lfc (edgeR - glmTreat) rather than arbitrarily cherry picking post-hoc. This will find genes significantly differentially expressed from those thresholds.

ADD COMMENT • link 3.0 years ago by jared.andrews07 ★ 16k

0

Entering edit mode

Thank you for the insight. My issue is that the likelihood ratio test I used only assigns an adjusted p-value that can be used to filter for significance (there is no logFC associated with the genes as it is a multi-group test). Even if I make the adjusted p-value ridiculously low, there is still a very large number of differentially expressed genes.

ADD REPLY • link 3.0 years ago by lucia.liu54 • 0

0

Entering edit mode

Is there a specific reason you have to use an LRT? Regardless, you can still use glmTreat with a glmFit model and get a modified LRT against the threshold in edgeR. See the glmTreat details for more info.

ADD REPLY • link 3.0 years ago by jared.andrews07 ★ 16k