Question: Best Clustering Algorithms for Mutation Data?
gravatar for blazer9131
14 months ago by
United States
blazer913110 wrote:

Hey ya'll.

I have a project with about 50-60 different samples with exome sequencing data. I have genotyped these samples and there are ~150 genes which have different levels of mutation ranging from missense, nonsense, indels, amplification, deletion, etc. I tiered them in terms of biological significance such that a 3 is significant impact, 2 has an impact, and 1 would be little impact. A sample w/o mutations at that gene had a 0.

I imported this into R and a df and tried to do classic clustering using hclust and made a few heatmaps/dendrograms. I used Ward.D2 for my analysis, but I'm not very skilled in statistics. I'm not sure if there would be a better algorithm for this dataset. Would anyone know a better method/algorithm? I'm trying to classify/group these samples using the exonic data I have.

R • 487 views
ADD COMMENTlink written 14 months ago by blazer913110

Please include sample data and What do you expect the result to be and what was you result when you launch your analysis? With that answer we can improve your analysis.

ADD REPLYlink written 14 months ago by anicet.ebou140

Clustering is about grouping items by similarity/proximity. You need to define what similarity/proximity is relevant in your case, i.e. what should items in the same cluster share that would differentiate them from another cluster. This helps in selecting the similarity measure used for clustering. Then the selection of clustering algorithm can be dependent on some knowledge/assumption about the cluster structure.

ADD REPLYlink written 14 months ago by Jean-Karim Heriche18k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1267 users visited in the last hour