Dear All,
I am trying to filter false positive variant calls using varscan somaticFilter however it thinks all of my variants are near to INDELS and removes all of them. I have done GATK re-alignment around INDELS (twice now) but it has not resolved the problem. Here is my variant calling code:
samtools mpileup -f $BAMS/hg38.fa -q 1 $BAMS/$gBAM$ext1 $BAMS/$fcBAM$ext1 | \
varscan somatic -mpileup $fcBAM --min-coverage-normal 8 \
--min-coverage-tumor 8 --p-value 0.05 --min-var-freq 0.02 --strand-filter 1 --output-vcf 1
and here is my code to filter false positive variants:
varscan somaticFilter $varscan_somatic/$i$ext1 --min-coverage 8 --min-reads2 2 --min-strands2 2 --min-var-freq 0.02 \
--indel-file $varscan_somatic/$i$ext1 --output-file $varscan_somatic/$i$filtered
here is the output of varscan somatic:
Window size: 10
Window SNPs: 3
Indel margin: 3
Reading input from /home/rmhawwo/Scratch/varscan_somatic/fc25.snp.vcf
13955 cluster SNPs identified
Reading input from /home/rmhawwo/Scratch/varscan_somatic/fc25.snp.vcf
43758 variants in input stream
1066 failed to meet coverage requirement
260 failed to meet reads2 requirement
38 failed to meet varfreq requirement
5135 failed to meet p-value requirement
4847 in SNP clusters were removed
32412 were removed near indels
0 passed filters
Here is the first few lines of snp calls:
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT NORMAL TUMOR
chr1 13273 . G C . PASS DP=158;SS=1;SSC=0;GPV=4.7946E-28;SPV=9.8322E-1 GT:GQ:DP:RD:AD:FREQ:DP4 0/1:.:71:30:39:56.52%:22,8,37,2 0/1:.:87:51:35:40.7%:44,7,30,5
chr1 14610 . T C . PASS DP=45;SOMATIC;SS=2;SSC=3;GPV=1E0;SPV=4.101E-1 GT:GQ:DP:RD:AD:FREQ:DP4 0/0:.:16:16:0:0%:16,0,0,0 0/1:.:29:27:2:6.9%:27,0,2,0
chr1 14653 . C T . PASS DP=131;SS=1;SSC=4;GPV=8.6909E-13;SPV=3.7801E-1 GT:GQ:DP:RD:AD:FREQ:DP4 0/1:.:37:28:9:24.32%:27,1,9,0 0/1:.:94:66:27:29.03%:63,3,27,0
chr1 14776 . G A . PASS DP=192;SS=1;SSC=6;GPV=2.2557E-5;SPV=2.2983E-1 GT:GQ:DP:RD:AD:FREQ:DP4 0/1:.:63:55:3:5.17%:0,55,0,3 0/1:.:129:111:12:9.76%:0,111,0,12
chr1 14798 . C G . PASS DP=181;SS=1;SSC=4;GPV=9.7008E-5;SPV=3.7571E-1 GT:GQ:DP:RD:AD:FREQ:DP4 0/1:.:57:52:3:5.45%:1,51,0,3 0/1:.:124:111:10:8.26%:1,110,0,10
chr1 16487 . T C . PASS DP=52;SS=1;SSC=0;GPV=6.2398E-3;SPV=7.9924E-1 GT:GQ:DP:RD:AD:FREQ:DP4 0/1:.:18:15:3:16.67%:15,0,3,0 0/1:.:34:28:4:12.5%:28,0,3,1
chr1 16495 . G C . PASS DP=51;SS=1;SSC=0;GPV=1.1021E-9;SPV=8.6543E-1 GT:GQ:DP:RD:AD:FREQ:DP4 0/1:.:18:7:10:58.82%:7,0,10,0 0/1:.:33:16:14:46.67%:15,1,14,0
chr1 17538 . C A . PASS DP=92;SS=1;SSC=4;GPV=3.4872E-4;SPV=3.8557E-1 GT:GQ:DP:RD:AD:FREQ:DP4 0/1:.:48:42:5:10.64%:38,4,3,2 0/1:.:44:34:6:15%:15,19,2,4
chr1 65797 . T C . PASS DP=23;SOMATIC;SS=2;SSC=12;GPV=1E0;SPV=5.4545E-2 GT:GQ:DP:RD:AD:FREQ:DP4 0/0:.:13:13:0:0%:13,0,0,0 0/1:.:10:6:3:33.33%:6,0,3,0
chr1 65872 . T G . PASS DP=68;SS=1;SSC=1;GPV=2.8856E-2;SPV=7.4755E-1 GT:GQ:DP:RD:AD:FREQ:DP4 0/1:.:37:33:3:8.33%:33,0,3,0 0/1:.:31:27:2:6.9%:27,0,2,0
chr1 69270 . A G . PASS DP=52;SS=1;SSC=0;GPV=6.1512E-28;SPV=1E0 GT:GQ:DP:RD:AD:FREQ:DP4 1/1:.:28:0:25:100%:0,0,25,0 1/1:.:24:0:22:100%:0,0,22,0
And here are the first few lines of indel calls:
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT NORMAL TUMOR
chr1 13417 . C CGAGA . PASS DP=173;SS=1;SSC=0;GPV=2.0462E-24;SPV=8.5199E-1 GT:GQ:DP:RD:AD:FREQ:DP4 0/1:.:62:35:27:43.55%:0,35,0,27 0/1:.:111:69:40:36.7%:0,69,0,40
chr1 15903 . G GC . PASS DP=53;SS=1;SSC=15;GPV=1E0;SPV=3.1275E-2 GT:GQ:DP:RD:AD:FREQ:DP4 1/1:.:35:8:27:77.14%:6,2,15,12 1/1:.:18:0:17:100%:0,0,1,16
chr1 129010 . AATG A . PASS DP=46;SS=1;SSC=4;GPV=1.3026E-2;SPV=3.3202E-1 GT:GQ:DP:RD:AD:FREQ:DP4 0/1:.:23:20:2:9.09%:20,0,2,0 0/1:.:23:18:4:18.18%:18,0,4,0
chr1 129148 . G GT . PASS DP=239;SS=1;SSC=3;GPV=3.6603E-3;SPV=4.2833E-1 GT:GQ:DP:RD:AD:FREQ:DP4 0/1:.:114:100:3:2.91%:50,50,0,3 0/1:.:125:111:5:4.31%:45,66,4,1
chr1 186111 . CCAAA C . PASS DP=37;SOMATIC;SS=2;SSC=7;GPV=1E0;SPV=1.982E-1 GT:GQ:DP:RD:AD:FREQ:DP4 0/0:.:15:15:0:0%:14,1,0,0 0/1:.:22:19:3:13.64%:19,0,3,0
chr1 188025 . CT C . PASS DP=29;SS=3;SSC=14;GPV=1E0;SPV=3.2841E-2 GT:GQ:DP:RD:AD:FREQ:DP4 0/1:.:10:7:3:30%:7,0,3,0 0/0:.:19:19:0:0%:17,2,0,0
chr1 189392 . ACC A . PASS DP=141;SS=1;SSC=2;GPV=8.2532E-25;SPV=5.082E-1 GT:GQ:DP:RD:AD:FREQ:DP4 0/1:.:74:37:34:47.89%:37,0,34,0 0/1:.:67:32:31:49.21%:32,0,31,0
chr1 189713 . GC G . PASS DP=45;SS=1;SSC=0;GPV=2.764E-2;SPV=8.4207E-1 GT:GQ:DP:RD:AD:FREQ:DP4 0/1:.:22:18:3:14.29%:0,18,0,3 0/1:.:23:20:2:9.09%:0,20,0,2
chr1 727679 . T TG . PASS DP=114;SOMATIC;SS=2;SSC=3;GPV=1E0;SPV=4.0132E-1 GT:GQ:DP:RD:AD:FREQ:DP4 0/0:.:40:35:0:0%:31,4,0,0 0/1:.:74:59:2:3.28%:39,20,1,1
chr1 939436 . C CT . PASS DP=48;SS=1;SSC=1;GPV=7.5306E-6;SPV=7.9348E-1 GT:GQ:DP:RD:AD:FREQ:DP4 0/1:.:13:8:5:38.46%:0,8,0,5 0/1:.:35:22:10:31.25%:1,21,1,9
chr1 956333 . TG T . PASS DP=33;SOMATIC;SS=2;SSC=5;GPV=1E0;SPV=2.5862E-1 GT:GQ:DP:RD:AD:FREQ:DP4 0/0:.:14:14:0:0%:0,14,0,0 0/1:.:19:13:2:13.33%:0,13,0,2
The snps are in most cases at least a few hundred bases away from INDELS and this is just the start of the list. I am wondering why it would filter this as I have read that snps are only removed from indels if they are within a few bases? Has anyone come across this before or have a solution to it? I have tried several call files but still encounter this problem with all of them. Perhaps there is something simple that I missed? Many thanks