Question

Determining Significance of Enrichment Analysis Within Clustering to Compare Clustering Methods

0

Entering edit mode

5.3 years ago

adavi140 • 0

Hello,

I have a dataset containing drugs and their corresponding indications. With such dataset, I used three different clustering methods to attempt to create groupings of drugs with similar indications. In order to identify which method produced the most accurate clustering to reflect my data, I completed an enrichment analysis of drug therapeutic classes for each cluster and method using Fisher's exact test.

I now have the corresponding p-values for each cluster. I am unsure of how to identify which p- values actually represent a significant enrichment of a drug class within a cluster and how to compare the enrichment results between clustering methods?

R • 850 views

ADD COMMENT • link updated 5.3 years ago by Jean-Karim Heriche 27k • written 5.3 years ago by adavi140 • 0

score 0 · Answer 1 · 2019-01-07

0

Entering edit mode

5.3 years ago

Jean-Karim Heriche 27k

If the goal of clustering is to try and get clusters that recover as much as possible drug indications then I would simply go for the clustering with the best [purity] relative to indication. To define a statistically significant enrichment you need to set a p-value threshold (and don't forget to account for multiple testing). However having a p-value for each cluster doesn't help in comparing clustering methods since it doesn't tell you about effect size (i.e. how many drugs of a given class is in a cluster). In a hypothetical example, you could have a cluster with five drugs with indication A and two drugs with indication B and the enrichment test returning indication B as significantly enriched but not A.

ADD COMMENT • link 5.3 years ago by Jean-Karim Heriche 27k

0

Entering edit mode

When comparing the enrichment results, I wish to state how many unique drug classes are found to be significantly enriched within each cluster and then compare with the other clustering methods. I do not know what an appropriate p-value threshold would be for my data as I have a large range in p-values when looking at the enrichment results from all three clustering methods.

ADD REPLY • link 5.3 years ago by adavi140 • 0

0

Entering edit mode

Then you could use FDR-adjusted p-values and set your threshold to what is an acceptable level of false positives for you, e.g. 0.1 or 0.05 or even lower.

ADD REPLY • link 5.3 years ago by Jean-Karim Heriche 27k