Determining Significance of Enrichment Analysis Within Clustering to Compare Clustering Methods
1
0
Entering edit mode
5.3 years ago
adavi140 • 0

Hello,

I have a dataset containing drugs and their corresponding indications. With such dataset, I used three different clustering methods to attempt to create groupings of drugs with similar indications. In order to identify which method produced the most accurate clustering to reflect my data, I completed an enrichment analysis of drug therapeutic classes for each cluster and method using Fisher's exact test.

I now have the corresponding p-values for each cluster. I am unsure of how to identify which p- values actually represent a significant enrichment of a drug class within a cluster and how to compare the enrichment results between clustering methods?

R • 850 views
ADD COMMENT
0
Entering edit mode
5.3 years ago

If the goal of clustering is to try and get clusters that recover as much as possible drug indications then I would simply go for the clustering with the best [purity] relative to indication. To define a statistically significant enrichment you need to set a p-value threshold (and don't forget to account for multiple testing). However having a p-value for each cluster doesn't help in comparing clustering methods since it doesn't tell you about effect size (i.e. how many drugs of a given class is in a cluster). In a hypothetical example, you could have a cluster with five drugs with indication A and two drugs with indication B and the enrichment test returning indication B as significantly enriched but not A.

ADD COMMENT
0
Entering edit mode

When comparing the enrichment results, I wish to state how many unique drug classes are found to be significantly enriched within each cluster and then compare with the other clustering methods. I do not know what an appropriate p-value threshold would be for my data as I have a large range in p-values when looking at the enrichment results from all three clustering methods.

ADD REPLY
0
Entering edit mode

Then you could use FDR-adjusted p-values and set your threshold to what is an acceptable level of false positives for you, e.g. 0.1 or 0.05 or even lower.

ADD REPLY

Login before adding your answer.

Traffic: 2845 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6