I am analyzing bulk RNA-seq data with GSEA and MSigDB to identify significantly enriched pathways.
I am interested in which signaling pathways are enriched, so I am planning on using "C2: curated gene sets", its subcollection "CP: Canonical pathways", or another subset "KEGG_MEDICUS subset of CP".
However, these collections contain many pathways that I am not interested in.
Considering that inflating the number of pathways being tested increases the number of hypothesis testing being performed, unnecessarily penalizing all relevant tests to control for Type I error, Q1) is it acceptable to further subset these collections/subcollections for pathways I am interested in testing?
Or do I need to stick with the entire collection generated by Broad Institute.
From C7: immunologic signature gene sets, there are certain studies and gene collections that I am interested in seeing if they are enriched in our experiment. Q2) Can I just test those individually in GSEA or do I need to feed the entire C7 collection?