Clear explanation of parameter minReads2 in VarScan
0
0
Entering edit mode
6.1 years ago
John ▴ 160

Dear all,

I really try to find some clear answer on Google, but it seems to me, that VarScan parameters understand only their creators.

I would prefer if somebody explain me (on example) what is doing parameter min-Reads2 in VarScan Germline caller.

I tried to change on set of Samples this value to:

min-reads2=1 ----> number of SNPs = 58
min-reads2=2 ----> number of SNPs = 58
min-reads2=3 ----> number of SNPs = 58
min-reads2=4 ----> number of SNPs = 58
min-reads2=20 ---->number of SNPs = 31


So only different was when I set up min-Reads2 to high value. When I compare all vcf - I can see, that missing 27 variants are only where AC=1 appears (Allele count in genotypes). So probably filter min-Reads2 depend on AC value.

Does anybody understand this parameter. Please do not copy explanation from manual (Minimum supporting reads at a position to call variants [2]) I need explanation on example.

Thank you very much.

John.

Varscan sequencing vcf • 2.5k views
0
Entering edit mode

hi John, There are, I think, default params. affecting the final outcome. The default somatic p-val. is at 0.05. So this might be trumping low read-depth support calls even when min-reads2 is altered. And as you noticed there can be other params like AC value affecting too. You can try lowering the threshold of p-value (--somatic-p-value) and then alter min-reads2.

0
Entering edit mode

Thank you Amitm.. I was playing with all the parameters. But do you understand what is doing parameter --min-reads2? And I am working with VarScan for Germline mutation. Thank you.

1
Entering edit mode

as far as I understand, thats the min. read support required for the variant allele. (repeating what is written on the website). So, in Germline mode (probably you are using mpileup2snp), thats the min. # of reads that should support the variant allele. An e.g. (a VCF line from single sample var. calling using mpileup2snp)

#CHROM  POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  Sample1


For this variant, the ref. supporting reads were 149 (5th value: RD of last col.) and var. allele supporting reads (6th value: AD of last col.) were 84. The --min-reads2 controls how low the AD value can go.

0
Entering edit mode

Great... I am looking at my data it make a sense right now. So does not mean that the parameter --min-var-freq is almost the same? When I look at your data - AD = 84, ADP = 233 - so your frequency is 36,05% (computationally is ok). So basically I only need to set up --min-var-freq or --min-reads2 isn't it? Algorithm probably first passed --min-reads2 and then check condition --min-var-freq.

1
Entering edit mode

hi, They are two ways of filtering or restricting the calls. You could use either or both. I do it like this in case of amplicon-seq. data - 1) First call (using mpileup2snp) all variants at moderate read-depth criteria (let say --min-reads2 at 10) and the desired --min-var-freq (maybe at 1%) 2) Then use filter module to apply a stronger read-depth criteria; lets say this time --min-reads2 at 30.

This returns low freq. calls but with added support of good read-depth. The advantage of making this two-tiered is that you can look for low confidence calls in the first VCF, if needed.

0
Entering edit mode

Thank you so much for clarification. I tried to use VarScan filter and if I use just --min-var-freq and --min-reads2 it is work but still filtering also by the strands (Filed strands). I can see in manual that strand filter is not option (only for somaticFilter). Do you have the same experience and why it is still some reads failed by strand filter. Can I turn it of? Thank you for sharing your experiences.

0
Entering edit mode

Good that you could solve some of your queries. As about strand filter, are you still talking about mpileup2snp? If yes then check the --strand-filter option. By default its 1. You can turn it off by passing value 0 instead.

Generally its not a good idea to turn off the strand filter. Wether in somatic mode or single-sample/Germline calling mode, I keep it on. The filter comes into play when there is high imbalance in the # of variant supporting reads from the plus strand vs. the minus strand. Check in the VCFs the ADF & ADR value in case of single-sample calling and DP4 value in case of Somatic-mode calling.