Calling somatic variants with Varscan
1
1
3.2 years ago
ww22runner ▴ 20

Hello everyone,

I am trying to use Varscan to find somatic variants but I want to apply filters on the minimum variant frequency and minimum number of variant reads. I have tumor-normal paired samples.

In this case, should I use: a) mpileup2snp and mpileup2indel separately and indicate the filters or b) somatic and then somaticFilter on each one of the files produced (snps and indels)?

Is there a difference in how these two approaches work? Also, I want the output in vcf format and according to the manual, only mpileup2snp/mpileup2indel offes that option?

Thanks!

varscan • 1.1k views
0
vcf format is available as output, if the right option is given. I usaed the second of the workflows you mentioned, and it worked pretty well.

0
2
3.1 years ago
ATpoint 55k

It is the second workflow that you want. I have a script in my Github that uses VarScan2 for somatic calling (not saying it is a nice one or fulfills any good-practice standard when it comes to style or whatever), but you might get some inspiration from it. It also uses GNU parallel to parallelize the process over all chromosomes.

It starts by calling the raw variants with VarScan2 somatic, then separates them into somatic and germline with processSomatic, selects high-confidence variants with on VarScan's Fisher's Exact Test, and finally runs the heuristic fpfilter to remove junk calls.

0
Hello ATpoint,

Thank you for your input. I have also been doing something along the same lines (somatic > processSomatic) to look at somatic variations but was wondering how different the results may be if I took approach 1 instead. Would the results be mostly similar? I am trying to switch over from approach 1 to 2 and wanted find out how I can fairly assess if it is worth switching over. Any advice is appreciated!

Thank you!

0
I do not know any details of VarScan2 actually (and I switched to Strelka2 because it is still maintained), but given that a somatic mode exists, I would use it. Everything else will require custom code, which will require evaluation to test if the variants you get from this custom procedure are actually reliable. Pulling out the somatics would require any kind of test to check if the allele frequency/allelic count is significantly higher or lower than in the germline, which is pretty much what implemented in processSomatic via a (I think) Fisher's exact test. So why the effort if somatic callers exist? Still, if you get started with a new project, probably switching to a more recent tool might be a good idea.

0
