Question: Multi-Sample Somatic Mutation Calling
gravatar for jockbanan
6.9 years ago by
Czech Republic
jockbanan390 wrote:

Hi all! I have 4 pairs of matched tumor/normal exome sequencing experiments. These are from 4 patients with the same type of tumor. I want to detect tumor-specific somatic mutations.

Looking at the documentation of SomaticSniper, VarScan, GATK somaticIndelDetector and other tools, it seems they all can only process one pair (one patient) at a time. I was just thinking if there is some tool capable of performing multi-sample analysis - utilizing the information from all the patients and reporting tumor-specific variants. I can always process these 4 pairs separately and then compare the results myself, but if some tool could use its statistic model to process multiple samples directly, I would like to try it. Do you have any suggestions? Thanks.

ADD COMMENTlink modified 6.9 years ago by DG7.2k • written 6.9 years ago by jockbanan390

What gains do you think will come from analyzing multiple samples concurrently? Though there are hotspots in a few driver genes, most cancer samples have a very unique somatic mutation profile.

ADD REPLYlink written 6.9 years ago by Chris Miller21k

I think there could be some value in eliminating false-positive calls by looking at their presence in unpaired normals. But not sure how much better an integrated analysis would be compared to a post-calling heuristic filter.

ADD REPLYlink written 6.9 years ago by Christian3.0k

This is true, but to really get a reliable feel for false-positive sites from the normals, I'd want way more than 4 samples.

ADD REPLYlink written 6.9 years ago by Chris Miller21k

Exactly, false-positives are the reason. And, well, yes, I would also like to have way more samples...

ADD REPLYlink written 6.9 years ago by jockbanan390

Yes, definitely value in this. Probably best to stick to downstream tools. You might also want to think about maintaining some sort of "Master" VCF with data about all samples you collect as, for instance, a merged VCF. You can then use tabix and other tools to quickly see the number of times specific mutations were seen in your normal samples for instance and apply that data to downstream heuristics and filters as appropriate.

ADD REPLYlink written 6.9 years ago by DG7.2k

Consider leveraging publicly available data from TCGA, or even 1000 genomes if you're just looking at the normals anyway.

ADD REPLYlink written 6.9 years ago by Chris Miller21k
gravatar for DG
6.9 years ago by
DG7.2k wrote:

I think it is generally a limitation of both computational overhead (you are already comparing two datasets in a run) and not wanting to deal with potential complexities of parsing multi-sample matched data. That said there are plenty of downstream tools for merging, comparing, and annotating vcf files to get to the shared somatic variants. snpEff and GEMINI for instance are great tools for annotating and data mining your results.

ADD COMMENTlink written 6.9 years ago by DG7.2k

Thanks for reply, I'll stick to downstream tools then.

ADD REPLYlink written 6.9 years ago by jockbanan390
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1921 users visited in the last hour