Question

enrichment analysis using cluster profiler

0

Entering edit mode

3.8 years ago

adR ▴ 120

Hello Guys, I would like to ask you on enrichment analysis result. I did KEEG hypergeometric test using a Bioconductor package clusterprofiler. pvalues was adjusted by 'none' with cutoff <0.5 and got a number of pathways on which most are interesting. Using pvalue cutoff < 0.05 I got very few number of pathways which are not interesting for my case. So I am just wondering if I can simply pick the pathways which I am interested from the many pathway I found using non-adjusted pvalue with a cutoff 0.5. I am not sure if I can do that. I need your valuable feed backs! Best, Amare

clusterProfiler R Biocondactor • 1.7k views

ADD COMMENT • link updated 3.8 years ago by Papyrus ★ 2.9k • written 3.8 years ago by adR ▴ 120

score 4 · Answer 1 · 2020-07-15

(Even without going into the multiple testing problem and why you should adjust your p-values):

Your hypergeometric test looks for an association, in this case, if your group of genes is more associated to a pathway (enriched) than expected by chance under the hypothesis of no association/enrichment. If the threshold for your unadjusted p-value is 0.5, you are saying that you consider to be "significant" a result that is expected to appear in your data, by chance, up to 50% of the time even when that pathway is not enriched. I would strongly advise against doing this.

Additionally, on a more personal note, I would suggest that the whole approach of analyzing statistical results should not be one of "picking" the results which are interesting to us, because that leads to many types of biases (see this). We perform tests (in the most unbiased manner) and then accept the results that we have, and interpret them.

If you are specifically interested in particular pathways, your analysis approach should be different. For starters, you may predefine which genes or groups of genes are the ones you are interested in and then maybe, in an exploratory manner, look at them jointly (in a heatmap, etc.) to see how they, in particular, behave. Although I suspect your data comes from a technology such as RNA-seq or arrays where all 20000 genes have been measured, so this last approach also has to be done carefully in order to avoid bias.