Question: VarScan: How is the DP score in a .vcf calculated?
1
gravatar for James Reeve
14 months ago by
James Reeve100
Canada
James Reeve100 wrote:

I'm trying to understand the formats of different SNP callers. I have made test bam files with a subset of two stickleback WGS samples. I made a vcf using VarScan's commad mpileup2snp.

Here's an example of a single SNP in the vcf;

chrI    1789    .       T       G       .       PASS    ADP=105;WT=1;HET=1;HOM=0;NC=0   GT:GQ:SDP:DP:RD:AD:FREQ:PVAL:RBQ:ABQ:RDF:RDR:ADF:ADR    0/0:66:108:108:87:18:17.14%:1.7211E-6:63:74:64:23:15:3  0/1:68:102:102:73:21:22.34%:1.3568E-7:60:56:48:25:19:2

I can understand most of the scores in the #FORMAT field, however I'm confused how DP is calculated. The reference and alternative allele depths are given (RD & AD), but the sum of the two doesn't equal DP. Using the example above;

Sample 1:

  • DP = 108
  • AD = 18
  • RD = 87
  • AD + RD = 105

Sample 2 :

  • DP = 102
  • AD = 21
  • RD = 73
  • AD + RD = 94

Does anyone have any ideas why the reported DP doesn't equal AD + RD?

varscan vcf • 651 views
ADD COMMENTlink modified 14 months ago by igor9.9k • written 14 months ago by James Reeve100

I don't think it's so simple, have a look at the code, varscan calls a method getReadCounts

```

  HashMap<String, String> readCounts = VarScan.getReadCounts(refBase, readBases, readQualities, minAvgQual, mapQualities);
  (...)

/**
 * Counts the number, quality, and strands of each allele from a pileup
 *
 * @param   refBase     Reference base at this position
 * @param   readBases   String of read bases from pileup
 * @param   readQuals   String of read base qualities from pileup
 * @param   minAvgQual  Integer of minimum required base quality to count a base.
 * @return  results     HashMap<String, String> of results for each allele
 */
static HashMap<String, String> getReadCounts(String refBase, String readBases, String readQuals, int minAvgQual, String mapQuals)
{
    HashMap<String, Integer> readCounts = new HashMap<String, Integer>();
    HashMap<String, Integer> readCountsPlus = new HashMap<String, Integer>();
    HashMap<String, Integer> readCountsMinus = new HashMap<String, Integer>();
    HashMap<String, Integer> qualitySum = new HashMap<String, Integer>();
    HashMap<String, Integer> mapQualitySum = new HashMap<String, Integer>();
    HashMap<String, String> strandsSeen = new HashMap<String, String>();

(...) ````

ADD REPLYlink modified 14 months ago • written 14 months ago by Pierre Lindenbaum127k
1
gravatar for igor
14 months ago by
igor9.9k
United States
igor9.9k wrote:

Different variant have different rules regarding thresholds for counting reads. Additionally, these criteria may vary for different fields. VarScan offers an explanation on their website:

VarScan requires that bases meet the minimum Phred quality score (default 15 for most commands) to count them for things like read counts (reads1, reads2) and to compute variant allele frequency. However, when VarScan reports the depth (such as in the DP field of VCF output), it reports SAMtools raw depth. To get VarScan read counts to more closely match another tool, set use parameter --min-avg-qual 0. And use caution! Low-quality bases, with the occasional exception of BAQ penalties, should not be trusted.

Also, VarScan reports variants on a biallelic basis. That is, for a given SNP call, the "reads1" column is the number of reference-supporting reads (RD), and the "reads2" column is the number of variant-supporting reads (AD). There may be additional reads at that position showing other bases (SNP or indel variants). If these other variants meet the calling criteria, they will be reported in their own line. If not, it may look like you have "missing" reads.

ADD COMMENTlink written 14 months ago by igor9.9k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1786 users visited in the last hour