Question: VarScan: How is the DP score in a .vcf calculated?
gravatar for James Reeve
2.1 years ago by
James Reeve110
James Reeve110 wrote:

I'm trying to understand the formats of different SNP callers. I have made test bam files with a subset of two stickleback WGS samples. I made a vcf using VarScan's commad mpileup2snp.

Here's an example of a single SNP in the vcf;

chrI    1789    .       T       G       .       PASS    ADP=105;WT=1;HET=1;HOM=0;NC=0   GT:GQ:SDP:DP:RD:AD:FREQ:PVAL:RBQ:ABQ:RDF:RDR:ADF:ADR    0/0:66:108:108:87:18:17.14%:1.7211E-6:63:74:64:23:15:3  0/1:68:102:102:73:21:22.34%:1.3568E-7:60:56:48:25:19:2

I can understand most of the scores in the #FORMAT field, however I'm confused how DP is calculated. The reference and alternative allele depths are given (RD & AD), but the sum of the two doesn't equal DP. Using the example above;

Sample 1:

  • DP = 108
  • AD = 18
  • RD = 87
  • AD + RD = 105

Sample 2 :

  • DP = 102
  • AD = 21
  • RD = 73
  • AD + RD = 94

Does anyone have any ideas why the reported DP doesn't equal AD + RD?

varscan vcf • 1.0k views
ADD COMMENTlink modified 2.1 years ago by igor12k • written 2.1 years ago by James Reeve110

I don't think it's so simple, have a look at the code, varscan calls a method getReadCounts


  HashMap<String, String> readCounts = VarScan.getReadCounts(refBase, readBases, readQualities, minAvgQual, mapQualities);

 * Counts the number, quality, and strands of each allele from a pileup
 * @param   refBase     Reference base at this position
 * @param   readBases   String of read bases from pileup
 * @param   readQuals   String of read base qualities from pileup
 * @param   minAvgQual  Integer of minimum required base quality to count a base.
 * @return  results     HashMap<String, String> of results for each allele
static HashMap<String, String> getReadCounts(String refBase, String readBases, String readQuals, int minAvgQual, String mapQuals)
    HashMap<String, Integer> readCounts = new HashMap<String, Integer>();
    HashMap<String, Integer> readCountsPlus = new HashMap<String, Integer>();
    HashMap<String, Integer> readCountsMinus = new HashMap<String, Integer>();
    HashMap<String, Integer> qualitySum = new HashMap<String, Integer>();
    HashMap<String, Integer> mapQualitySum = new HashMap<String, Integer>();
    HashMap<String, String> strandsSeen = new HashMap<String, String>();

(...) ````

ADD REPLYlink modified 2.1 years ago • written 2.1 years ago by Pierre Lindenbaum134k
gravatar for igor
2.1 years ago by
United States
igor12k wrote:

Different variant have different rules regarding thresholds for counting reads. Additionally, these criteria may vary for different fields. VarScan offers an explanation on their website:

VarScan requires that bases meet the minimum Phred quality score (default 15 for most commands) to count them for things like read counts (reads1, reads2) and to compute variant allele frequency. However, when VarScan reports the depth (such as in the DP field of VCF output), it reports SAMtools raw depth. To get VarScan read counts to more closely match another tool, set use parameter --min-avg-qual 0. And use caution! Low-quality bases, with the occasional exception of BAQ penalties, should not be trusted.

Also, VarScan reports variants on a biallelic basis. That is, for a given SNP call, the "reads1" column is the number of reference-supporting reads (RD), and the "reads2" column is the number of variant-supporting reads (AD). There may be additional reads at that position showing other bases (SNP or indel variants). If these other variants meet the calling criteria, they will be reported in their own line. If not, it may look like you have "missing" reads.

ADD COMMENTlink written 2.1 years ago by igor12k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1095 users visited in the last hour