Grouping/Clustering Bam Files / Samples Based On Similarity/Any Other 'Distance' Metric
0
0
Entering edit mode
11.4 years ago
bav8718 • 0

Hello,

I have about 100 NGS exome paired-end samples for which I have as many aligned BAM files. I wish to group them based on similarity or any other 'distance' metric using some kind of algorithm. The goal is to reduce the batch effect in downstream structural variant analysis, if we were to analyze all samples together. The only plausible way to reduce this batch effect is to divide the samples into groups.

The groups should be such that the samples within each group should show a high correlation.

I was wondering what parameter from the BAM file should I use to group these samples, lets say by K-means clustering ?

Any suggestions would be highly appreciated.

clustering • 2.6k views
ADD COMMENT
0
Entering edit mode

What is the "downstream structural variant analysis" you are going to perform? What batch effect do you expect will confound that analysis? What is the biological question you are trying to answer with your analysis? Any details you have will be helpful in figuring out what you want to do.

ADD REPLY
0
Entering edit mode

If your goal is to to do structural variant analysis on exome data I think you first have to come up with a strategy to do that. I think it is not easy / impossible to do that, cnv and loh is doable but depends on a control sample (you can also do without but haring a control is superior)

ADD REPLY

Login before adding your answer.

Traffic: 1478 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6