What are the cons of using hierarchical clustering to construct cohorts for GWAS studies? For example, if I do not trust my target labels (classification labels) (as in the case of mental disorder classifications from the DSM/ICD) and believe the labels to represent false boundaries between complex disease states, can I not just disregard them and perform GWAS on cohorts of patients with mental disorders as defined by a given level "cut" in the hierarchical clustering? Has anyone already done this?
If I understand your question correctly, the answer is you can't do that.
In a GWAS study you would expect a few out of hundreds of thousands of SNPs to be responsible for your phenotype.
By an unsupervised method, the effect of those few SNPs would be highly dissolved among the strong effects of thousands of other SNPs governing differences such as ethnicity background.
There is no way you are able to distinguish your case/controls by unsupervised approaches in GWAS data.
If you see that by unsupervised clustering of GWAS data your case/controls cluster any better than random, there is probably confounding factors such as batch effect, population stratification, etc.