I'm trying to understand the formats of different SNP callers. I have made test bam files with a subset of two stickleback WGS samples. I made a vcf using VarScan's commad mpileup2snp
.
Here's an example of a single SNP in the vcf;
chrI 1789 . T G . PASS ADP=105;WT=1;HET=1;HOM=0;NC=0 GT:GQ:SDP:DP:RD:AD:FREQ:PVAL:RBQ:ABQ:RDF:RDR:ADF:ADR 0/0:66:108:108:87:18:17.14%:1.7211E-6:63:74:64:23:15:3 0/1:68:102:102:73:21:22.34%:1.3568E-7:60:56:48:25:19:2
I can understand most of the scores in the #FORMAT
field, however I'm confused how DP is calculated. The reference and alternative allele depths are given (RD & AD), but the sum of the two doesn't equal DP. Using the example above;
Sample 1:
- DP = 108
- AD = 18
- RD = 87
- AD + RD = 105
Sample 2 :
- DP = 102
- AD = 21
- RD = 73
- AD + RD = 94
Does anyone have any ideas why the reported DP doesn't equal AD + RD?
I don't think it's so simple, have a look at the code, varscan calls a method
getReadCounts
```
(...) ````