Selecting a subset of MSigDB database for GSEA
Entering edit mode
3 months ago
Orange ▴ 30

Hi all

I am analyzing bulk RNA-seq data with GSEA and MSigDB to identify significantly enriched pathways.

I am interested in which signaling pathways are enriched, so I am planning on using "C2: curated gene sets", its subcollection "CP: Canonical pathways", or another subset "KEGG_MEDICUS subset of CP".

However, these collections contain many pathways that I am not interested in.

Considering that inflating the number of pathways being tested increases the number of hypothesis testing being performed, unnecessarily penalizing all relevant tests to control for Type I error, Q1) is it acceptable to further subset these collections/subcollections for pathways I am interested in testing?

Or do I need to stick with the entire collection generated by Broad Institute.

From C7: immunologic signature gene sets, there are certain studies and gene collections that I am interested in seeing if they are enriched in our experiment. Q2) Can I just test those individually in GSEA or do I need to feed the entire C7 collection?


fgsea clusterprofiler GSEA • 504 views
Entering edit mode

Q1: Yes, I personally think meaningful subsetting will help reducing multiple-testing burden.

Q2: Whatever floats your boat. Personally I think these enrichment analysis are a mess anyway, both statistically and in the sense that the collections are not really standardized (sometimes REACTOME terms for example contain genes that cannot be mapped because of typos) and included genes are (heavily) redundant between pathways.

After all, these analysis may suggest something, and that always needs to be confirmed by other analysis or experiments. I would never see term enrichment of any kind as a "proof" for anything.

Entering edit mode

The following is definitely worth a read if you are doing gene set enrichment analyses

Urgent need for consistent standards in functional enrichment analysis

Entering edit mode

Thank you both for your suggestions. I will keep those points in mind and will also try to follow the best practices outlined in the linked paper.


Login before adding your answer.

Traffic: 1917 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6