Question: Grouping/Clustering Bam Files / Samples Based On Similarity/Any Other 'Distance' Metric
gravatar for bav8718
7.9 years ago by
bav87180 wrote:


I have about 100 NGS exome paired-end samples for which I have as many aligned BAM files. I wish to group them based on similarity or any other 'distance' metric using some kind of algorithm. The goal is to reduce the batch effect in downstream structural variant analysis, if we were to analyze all samples together. The only plausible way to reduce this batch effect is to divide the samples into groups.

The groups should be such that the samples within each group should show a high correlation.

I was wondering what parameter from the BAM file should I use to group these samples, lets say by K-means clustering ?

Any suggestions would be highly appreciated.

clustering • 2.0k views
ADD COMMENTlink modified 7.9 years ago • written 7.9 years ago by bav87180

What is the "downstream structural variant analysis" you are going to perform? What batch effect do you expect will confound that analysis? What is the biological question you are trying to answer with your analysis? Any details you have will be helpful in figuring out what you want to do.

ADD REPLYlink written 7.9 years ago by Sean Davis26k

If your goal is to to do structural variant analysis on exome data I think you first have to come up with a strategy to do that. I think it is not easy / impossible to do that, cnv and loh is doable but depends on a control sample (you can also do without but haring a control is superior)

ADD REPLYlink written 7.9 years ago by Irsan7.2k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1739 users visited in the last hour