Question: Why different somatic mutation callers agree so poorly on tumor sample pairs?
5
gravatar for heartheone
4.6 years ago by
heartheone70
China
heartheone70 wrote:

I used Mutect Strelka and Varscan 2 on  multiple nomal-tumor pairs of sequencing data. Default parameters and recommanded filtration were applied.

To my disappointment, they had reallllly bad concordance. Only about 10%-30% of calls given by a caller could be hit by other tools.

Do you have any suggestion for that? I would be very grateful for any help!

 

Robert

ADD COMMENTlink modified 4.6 years ago • written 4.6 years ago by heartheone70
1

If they all gave the same answer, there wouldn't need to be so many of them.  They each have different strengths and weaknesses, which is a feature, not a bug; it means that for a given type of analysis, one or another (or a combination of more than one) is the right tool for the job.

ADD REPLYlink written 4.6 years ago by Jonathan Dursi270
4
gravatar for Charles Warden
4.6 years ago by
Charles Warden7.0k
Duarte, CA
Charles Warden7.0k wrote:

I agree with Chris.  More specifically, I would recommend filtering the VarScan results.

For example, try requiring a minimum of 10 reads total coverage (in both tumor and normal), minimum of 4 reads with the variant in the tumor sample, minimum of 30% tumor allele frequency, and  *maximum* of 5% normal allele frequency.

I recently tried using those parameters for some somatic VarScan variants (for WGS data) and I thought they yielded decent results (although I thought the Strelka 'passed' variants were better for small indels).

In the case of single-sample analysis, there are benchmarks for justifying a similar set of parameters in this paper:

https://peerj.com/articles/600/

ADD COMMENTlink written 4.6 years ago by Charles Warden7.0k
1

If I'm looking for a rare subclone that contributes to therapy resistance, setting a 30% threshold is going to be a bad idea. Similiarly, if I have a impure tumor, there may be nothing above 30%. Or if I have 500x coverage, we can expect to reliably detect variants at far lower VAFs. The point here is, that the parameters you use should be chosen intelligently, based on the details of your experiment.

ADD REPLYlink written 4.6 years ago by Chris Miller21k

True - that is a poor choice of parameters for studying subclones.

However, I would expect those parameters to identify variants that show a greater concordance rate with the Strelka / MuTect variant lists, if the user is interested in defining a conservative set of somatic variants.

ADD REPLYlink modified 4.6 years ago • written 4.6 years ago by Charles Warden7.0k
3
gravatar for Chris Miller
4.6 years ago by
Chris Miller21k
Washington University in St. Louis, MO
Chris Miller21k wrote:

That actually sounds about right, depending on the parameters that you used. Some notes:

1) I'd expect the concordance to be fairly high for high-VAF variants, but when you get down to rare subclonal, variants, where only a small amount of read support exists, callers can handle those cases very differently.

2) Do you want high specificity or sensitivity - each caller has made it's own set of tradeoffs between the two, and figuring out the 'sweet spot' for you will depend on your experiment.

3) Intersecting the data will generally improve specificity (I'd expect 90%+ if something is called by all three callers), but will lose you a lot of the true positives at low VAF that may only be picked up by one of the statistical models. Again, just depends on what you're hoping to accomplish with your experiment.

ADD COMMENTlink written 4.6 years ago by Chris Miller21k
1
gravatar for heartheone
4.6 years ago by
heartheone70
China
heartheone70 wrote:

Really  thanks for your advice!!

I respond here since I can't add comment or reply somehow. The button is gray and unavailable.

Thanks to your useful help I found each caller had its own drawback and preference for mutation detection in tumor tissues. And all of them need optimization of parameters and futher filtration. By manual check using IGV, I discoverd that MuTect genenally discovered most reliable candidates, while quite a few Strelka calls are  haunted  with low-quality mapping, while Varscan2 not very sensitive to low VAF calls.

ADD COMMENTlink written 4.6 years ago by heartheone70

Yes, Mutect is tuned to be reasonably sensitive and highly specific. You could certainly do worse if you're going to use a single caller approach. FWIW, intersecting callers in intelligent ways can give better results, but requires thinking carefully about how to do so.

ADD REPLYlink written 4.6 years ago by Chris Miller21k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 863 users visited in the last hour