Over-representation Gene Ontology Analysis on Subset of DE Genes
1
0
Entering edit mode
3.0 years ago

I have run a likelihood ratio test on a three-condition comparison and received a very large number of significantly differentially expressed genes (>8,000). After performing over-representation analysis on all DE genes, no GO terms were significantly over-represented. I'm assuming this is because the significant list is about half of the background, or all genes tested for differential expression. Would it be incorrect statistical analysis to subset the top results (say, the top 1,000 DE genes by adjusted p-value) and perform over-representation analysis on that subset? It seems incorrect to take only a portion of significant results, but as a student with limited statistical knowledge I wanted to check.

GO LRT • 1.1k views
ADD COMMENT
2
Entering edit mode
3.0 years ago

Cases like this are where GSEA really shines, as it takes all genes into account and doesn't require you to manually specify which genes are differentially expressed - only to rank them (for which logFC * -log10(pvalue) works rather well).

Alternatively, you can be more stringent during DEG calling by using an lfcThreshold (DESeq2) or lfc (edgeR - glmTreat) rather than arbitrarily cherry picking post-hoc. This will find genes significantly differentially expressed from those thresholds.

ADD COMMENT
0
Entering edit mode

Thank you for the insight. My issue is that the likelihood ratio test I used only assigns an adjusted p-value that can be used to filter for significance (there is no logFC associated with the genes as it is a multi-group test). Even if I make the adjusted p-value ridiculously low, there is still a very large number of differentially expressed genes.

ADD REPLY
0
Entering edit mode

Is there a specific reason you have to use an LRT? Regardless, you can still use glmTreat with a glmFit model and get a modified LRT against the threshold in edgeR. See the glmTreat details for more info.

ADD REPLY

Login before adding your answer.

Traffic: 2002 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6