Question

VarScan: How is the DP score in a .vcf calculated?

1

Entering edit mode

6.5 years ago

James Reeve ▴ 130

I'm trying to understand the formats of different SNP callers. I have made test bam files with a subset of two stickleback WGS samples. I made a vcf using VarScan's commad mpileup2snp.

Here's an example of a single SNP in the vcf;

chrI    1789    .       T       G       .       PASS    ADP=105;WT=1;HET=1;HOM=0;NC=0   GT:GQ:SDP:DP:RD:AD:FREQ:PVAL:RBQ:ABQ:RDF:RDR:ADF:ADR    0/0:66:108:108:87:18:17.14%:1.7211E-6:63:74:64:23:15:3  0/1:68:102:102:73:21:22.34%:1.3568E-7:60:56:48:25:19:2

I can understand most of the scores in the #FORMAT field, however I'm confused how DP is calculated. The reference and alternative allele depths are given (RD & AD), but the sum of the two doesn't equal DP. Using the example above;

Sample 1:

DP = 108
AD = 18
RD = 87
AD + RD = 105

Sample 2 :

DP = 102
AD = 21
RD = 73
AD + RD = 94

Does anyone have any ideas why the reported DP doesn't equal AD + RD?

VarScan VCF • 2.7k views

ADD COMMENT • link updated 6.5 years ago by igor 13k • written 6.5 years ago by James Reeve ▴ 130

0

Entering edit mode

I don't think it's so simple, have a look at the code, varscan calls a method getReadCounts

```

  HashMap<String, String> readCounts = VarScan.getReadCounts(refBase, readBases, readQualities, minAvgQual, mapQualities);
  (...)

/**
 * Counts the number, quality, and strands of each allele from a pileup
 *
 * @param   refBase     Reference base at this position
 * @param   readBases   String of read bases from pileup
 * @param   readQuals   String of read base qualities from pileup
 * @param   minAvgQual  Integer of minimum required base quality to count a base.
 * @return  results     HashMap<String, String> of results for each allele
 */
static HashMap<String, String> getReadCounts(String refBase, String readBases, String readQuals, int minAvgQual, String mapQuals)
{
    HashMap<String, Integer> readCounts = new HashMap<String, Integer>();
    HashMap<String, Integer> readCountsPlus = new HashMap<String, Integer>();
    HashMap<String, Integer> readCountsMinus = new HashMap<String, Integer>();
    HashMap<String, Integer> qualitySum = new HashMap<String, Integer>();
    HashMap<String, Integer> mapQualitySum = new HashMap<String, Integer>();
    HashMap<String, String> strandsSeen = new HashMap<String, String>();

(...) ````

ADD REPLY • link 6.5 years ago by Pierre Lindenbaum 166k

score 1 · Accepted Answer · 2019-01-24

Different variant have different rules regarding thresholds for counting reads. Additionally, these criteria may vary for different fields. VarScan offers an explanation on their website:

VarScan requires that bases meet the minimum Phred quality score (default 15 for most commands) to count them for things like read counts (reads1, reads2) and to compute variant allele frequency. However, when VarScan reports the depth (such as in the DP field of VCF output), it reports SAMtools raw depth. To get VarScan read counts to more closely match another tool, set use parameter --min-avg-qual 0. And use caution! Low-quality bases, with the occasional exception of BAQ penalties, should not be trusted.

Also, VarScan reports variants on a biallelic basis. That is, for a given SNP call, the "reads1" column is the number of reference-supporting reads (RD), and the "reads2" column is the number of variant-supporting reads (AD). There may be additional reads at that position showing other bases (SNP or indel variants). If these other variants meet the calling criteria, they will be reported in their own line. If not, it may look like you have "missing" reads.