Question: Tips for clustering phenotype data into classes prior to GWAS?
gravatar for gavinmdouglas
5.0 years ago by
gavinmdouglas10 wrote:

I have a table of metabolite concentrations for ~100 metabolites in ~300 plant cultivars. Eventually I plan on running a GWAS to link these phenotypes to sequence data from these cultivars. I would also like to combine the metabolites into different clusters in case that increases power. I have run a PCA and simple hierarchical clustering in R, but I'm wondering whether there is another approach I should be using? If anyone has any recommendations they would be appreciated! I haven't been able to find any standard approach in a number of different GWAS papers I have looked at.







phenotype gwas • 1.8k views
ADD COMMENTlink written 5.0 years ago by gavinmdouglas10

There are many different clustering algorithms each with their own characteristics in particular regarding what kind of structures they are best able to find. Usually, if there is some structure/pattern in the data, most common algorithms will be able to find it. If you don't see any pattern with hierarchical clustering and/or PCA but you expect that there's structure in the data then the structure doesn't conform to these models.
Note also that the choice of distance/similarity measure is important. For example, Euclidean distance is often useless with vectors of more than ~20 noisy variables because it is subject to distance concentration. Distance concentration is the effect by which, in high dimensional spaces, the farthest and closest neighbours have same distance or put another way, the distance measure tends towards a constant. In particular, if your 100 metabolite concentrations are i.i.d. then Euclidean distance will most likely be useless. How well separated your clusters are will determine whether distance concentration is an issue.

ADD REPLYlink written 5.0 years ago by Jean-Karim Heriche23k

Thanks for your response, I've never heard of distance concentration and will look into it.

ADD REPLYlink written 5.0 years ago by gavinmdouglas10

Hi Gavin, How did you perform your analysis? I am now in a similar situation and clueless. Any suggestion would be appreciated.

Thanks, Abhishek

ADD REPLYlink written 3.4 years ago by abhishekniroula750
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1702 users visited in the last hour