Question: Patient Clustering Methods for Complex Phenotype GWAS studies
3.6 years ago
United States
What are the cons of using hierarchical clustering to construct cohorts for GWAS studies? For example, if I do not trust my target labels (classification labels) (as in the case of mental disorder classifications from the DSM/ICD) and believe the labels to represent false boundaries between complex disease states, can I not just disregard them and perform GWAS on cohorts of patients with mental disorders as defined by a given level "cut" in the hierarchical clustering? Has anyone already done this?

3.6 years ago
If I understand your question correctly, the answer is you can't do that.

In a GWAS study you would expect a few out of hundreds of thousands of SNPs to be responsible for your phenotype.

By an unsupervised method, the effect of those few SNPs would be highly dissolved among the strong effects of thousands of other SNPs governing differences such as ethnicity background.

There is no way you are able to distinguish your case/controls by unsupervised approaches in GWAS data.

If you see that by unsupervised clustering of GWAS data your case/controls cluster any better than random, there is probably confounding factors such as batch effect, population stratification, etc.


"In a GWAS study you would expect a few out of hundreds of thousands of SNPs to be responsible for your phenotype."

This expectation is certainly not shared by everyone doing GWAS. Alkes Price has down a lot of work showing that GWAS may have failed to show greater evidence of association due to contribution of thousands of markers associated at less than the 5x10-8 threshold commonly used in GWA studies

