Question: Mixed opinions on somatic variant calling method
1
gravatar for umn_bist
3.9 years ago by
umn_bist350
umn_bist350 wrote:

I have sought previous posts (post1, post2) on how to call somatic variants and it seems the general practice is to intersect multiple callers to insure that low VAF mutations are being called.

My approach was to samtools, MuTect2, SomaticSniper, VarScan2 but I found an interesting post saying that as long as read placements are perfect, any caller suffices (even samtool mpileup). I should mention that I am working with RNA seq of cancer samples (matched with normal).

In general, my view is as long as read placement is perfect, even naive methods work sufficiently well... To me, the simplest yet most effective strategy is to use two distinct alignment algorithms, such as bwa and bwa-sw, which have distinct error modes. You only consider mutations shared between the two alignments... Another complication is structural variations, in which I am less experienced. In some sense, false mutations caused by structural variations are still indication of something different between normal and tumor... In all, I think you do not need to worry about which software to use for detecting somatic mutations - anything reasonable is fine. You should pay more attention to mismapping and structural variations.

First, does read placement mean how well aligners align our samples to the reference? Does working with RNA-Seq introduce higher error rates in read placements? Has the consensus changed or is intersecting multiple callers still recommended? Thank you very much for your help.

ADD COMMENTlink modified 3.9 years ago • written 3.9 years ago by umn_bist350
3
gravatar for Chris Miller
3.9 years ago by
Chris Miller21k
Washington University in St. Louis, MO
Chris Miller21k wrote:

If someone tells you that somatic mutation calling is easy or a solved problem, they have never really tried to do somatic mutation calling.  

There are a host of issue to contend with - sequencing artifacts, problems with the reference, differential coverage, (and yes, mismappings are common!), etc.  Your approach of using mutliple callers seems sensible. The tricky part is figuring out how to combine them. Straight intersections will give you high specificity, but low sensitivity.  Unioning the three will result in the opposite.  A more nuanced approach has been been explored by recent tools like somaticseq (http://www.genomebiology.com/2015/16/1/197). I haven't used that one in particular, but I am convinced that an approach of that nature is most likely to succeed.

ADD COMMENTlink modified 3.9 years ago • written 3.9 years ago by Chris Miller21k

Thank you for your reply. The reference you provided will be a great help. One thing - can you clarify this part of your response:

Straight intersections will give you high specificity, but low specificity.
ADD REPLYlink written 3.9 years ago by umn_bist350
1

Whoops - intersections will give you high *specificity* but low *sensitivity*. I'll edit my answer to fix that!

ADD REPLYlink written 3.9 years ago by Chris Miller21k

No problem. Thank you for the clarification. Question: for these callers (specifically samtools mpileup), are there any documentations of common/established hard filters used for somatic variants?

ADD REPLYlink modified 3.9 years ago • written 3.9 years ago by umn_bist350
2

samtools mpileup will not be a good approach without some significant downstream work to determine the evidence for a normal genotype in the normal and a different (mutant) genotype in the tumor.  I'd stick to somatic variant callers for calling somatic variants. 

ADD REPLYlink written 3.9 years ago by Sean Davis25k

That is unfortunate. I am a new trainee and samtools was what was most comfortable. I may just follow up on somaticSeq and use their pipeline considering the limitation of using any one single caller. Thank you for your help.

Do you have any recommended papers that explains the challenges of analyzing somatic variants?

ADD REPLYlink modified 3.9 years ago • written 3.9 years ago by umn_bist350
1

Somaticseq uses a machine learning approach.  Therefore, to put it to best use, you need a training set.  I suspect that you don't have such a set, so you might want to start by running some tools like strelka, mutect, lofreq, varscan2, etc. 

http://bmcgenomics.biomedcentral.com/articles/10.1186/1471-2164-15-244

http://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-14-189

http://bioinformatics.oxfordjournals.org/content/early/2013/07/13/bioinformatics.btt375.long

 

ADD REPLYlink written 3.9 years ago by Sean Davis25k

There's some info that will be useful to you in our recent paper here:

Optimizing Cancer Genome Sequencing and Analysis
http://www.cell.com/cell-systems/abstract/S2405-4712(15)00113-1

See Figure 4 and the supplement for some detailed info on specific variant callers and how they performed on this ultra-deep, highly validated tumor. 

ADD REPLYlink written 3.9 years ago by Chris Miller21k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 763 users visited in the last hour