I've run a script calling variants using mpileup > varscan. Input to the mpileup file are 7 samples each from 9 different patients. I.e. this is multi-sample variant calling.
The data is WES, and I am interested in 19 genes, for which a bed file was used to specify in mpileup.
I'm quite sure something has gone wrong with the resulting VCF file however.
Firstly, there are a lot of variants which have in the genotype information column something like: ./.:.:1 I'm not quite sure what this means, from looking in IGV it looks like there are no reads mapped to that location for the sample, is this correct?
Additionally, some variants are appearing in regions not specified in the bed file: For example, a variant at chr4 base 178937719 appears. This is not within any regions I specified on chr4. In addition to that, in this example, all samples return ./.:.:1 except for one sample in one 110. But looking at this position on IGV, there are no reads present in that sample.
To make matters worse, this is repeated over and over. In fact, for such locations variants are being called one base after the other, with identical genomic information to each other. These are flooding the VCF file, with there being 91245 supposed variants in the file.
I'm not quite sure what is causing this, I used the same reference (hg19) for mapping (BWA) and samtools mpileup.
In the mpileup log, a lot of lines appear saying something like: [mplp_func] Skipping because 187532594 is outside of 180915260 [ref:4]. E.g. this is when looking at chr4 187508947-187645010.
I cant think what might be wrong, would anybody be able to explain the issue here and describe how to fix it?
Here is an example of what i'm dealing with:
chr4 30720203 . A G . PASS ADP=1;WT=48;HET=0;HOM=2;NC=0 GT:GQ:SDP:DP:RD:AD:FREQ:PVAL:RBQ:ABQ:RDF:RDR:ADF:ADR ./.:.:1 ./.:.:1 ./.:.:1 ./.:.:1 ./.:.:1 ./.:.:1
chr4 30720204 . T C . PASS ADP=1;WT=48;HET=0;HOM=2;NC=0 GT:GQ:SDP:DP:RD:AD:FREQ:PVAL:RBQ:ABQ:RDF:RDR:ADF:ADR ./.:.:1 ./.:.:1 ./.:.:1 ./.:.:1 ./.:.:1 ./.:.:1
chr4 30720205 . G C . PASS ADP=1;WT=48;HET=0;HOM=2;NC=0 GT:GQ:SDP:DP:RD:AD:FREQ:PVAL:RBQ:ABQ:RDF:RDR:ADF:ADR ./.:.:1 ./.:.:1 ./.:.:1 ./.:.:1 ./.:.:1 ./.:.:1
chr4 30720206 . G A . PASS ADP=1;WT=48;HET=0;HOM=2;NC=0 GT:GQ:SDP:DP:RD:AD:FREQ:PVAL:RBQ:ABQ:RDF:RDR:ADF:ADR ./.:.:1 ./.:.:1 ./.:.:1 ./.:.:1 ./.:.:1 ./.:.:1
chr4 30720207 . A T . PASS ADP=1;WT=48;HET=0;HOM=2;NC=0 GT:GQ:SDP:DP:RD:AD:FREQ:PVAL:RBQ:ABQ:RDF:RDR:ADF:ADR ./.:.:1 ./.:.:1 ./.:.:1 ./.:.:1 ./.:.:1 ./.:.:1
chr4 30720209 . A G . PASS ADP=1;WT=48;HET=0;HOM=2;NC=0 GT:GQ:SDP:DP:RD:AD:FREQ:PVAL:RBQ:ABQ:RDF:RDR:ADF:ADR ./.:.:1 ./.:.:1 ./.:.:1 ./.:.:1 ./.:.:1 ./.:.:1
chr4 30720210 . C G . PASS ADP=1;WT=48;HET=0;HOM=2;NC=0 GT:GQ:SDP:DP:RD:AD:FREQ:PVAL:RBQ:ABQ:RDF:RDR:ADF:ADR ./.:.:1 ./.:.:1 ./.:.:1 ./.:.:1 ./.:.:1 ./.:.:1
chr4 30720211 . G T . PASS ADP=1;WT=48;HET=0;HOM=2;NC=0 GT:GQ:SDP:DP:RD:AD:FREQ:PVAL:RBQ:ABQ:RDF:RDR:ADF:ADR ./.:.:1 ./.:.:1 ./.:.:1 ./.:.:1 ./.:.:1 ./.:.:1
chr4 30720212 . T A . PASS ADP=1;WT=48;HET=0;HOM=2;NC=0 GT:GQ:SDP:DP:RD:AD:FREQ:PVAL:RBQ:ABQ:RDF:RDR:ADF:ADR ./.:.:1 ./.:.:1 ./.:.:1 ./.:.:1 ./.:.:1 ./.:.:1
chr4 30720213 . C A . PASS ADP=1;WT=48;HET=0;HOM=2;NC=0 GT:GQ:SDP:DP:RD:AD:FREQ:PVAL:RBQ:ABQ:RDF:RDR:ADF:ADR ./.:.:1 ./.:.:1 ./.:.:1 ./.:.:1 ./.:.:1 ./.:.:1
chr4 30720214 . A T . PASS ADP=1;WT=48;HET=0;HOM=2;NC=0 GT:GQ:SDP:DP:RD:AD:FREQ:PVAL:RBQ:ABQ:RDF:RDR:ADF:ADR ./.:.:1 ./.:.:1 ./.:.:1 ./.:.:1 ./.:.:1 ./.:.:1