Question: VarScan: How is the DP score in a .vcf calculated?
1
gravatar for James Reeve
3 months ago by
James Reeve70
Canada
James Reeve70 wrote:

I'm trying to understand the formats of different SNP callers. I have made test bam files with a subset of two stickleback WGS samples. I made a vcf using VarScan's commad mpileup2snp.

Here's an example of a single SNP in the vcf;

chrI    1789    .       T       G       .       PASS    ADP=105;WT=1;HET=1;HOM=0;NC=0   GT:GQ:SDP:DP:RD:AD:FREQ:PVAL:RBQ:ABQ:RDF:RDR:ADF:ADR    0/0:66:108:108:87:18:17.14%:1.7211E-6:63:74:64:23:15:3  0/1:68:102:102:73:21:22.34%:1.3568E-7:60:56:48:25:19:2

I can understand most of the scores in the #FORMAT field, however I'm confused how DP is calculated. The reference and alternative allele depths are given (RD & AD), but the sum of the two doesn't equal DP. Using the example above;

Sample 1:

  • DP = 108
  • AD = 18
  • RD = 87
  • AD + RD = 105

Sample 2 :

  • DP = 102
  • AD = 21
  • RD = 73
  • AD + RD = 94

Does anyone have any ideas why the reported DP doesn't equal AD + RD?

varscan vcf • 256 views
ADD COMMENTlink modified 3 months ago by igor7.6k • written 3 months ago by James Reeve70

I don't think it's so simple, have a look at the code, varscan calls a method getReadCounts

```

  HashMap<String, String> readCounts = VarScan.getReadCounts(refBase, readBases, readQualities, minAvgQual, mapQualities);
  (...)

/**
 * Counts the number, quality, and strands of each allele from a pileup
 *
 * @param   refBase     Reference base at this position
 * @param   readBases   String of read bases from pileup
 * @param   readQuals   String of read base qualities from pileup
 * @param   minAvgQual  Integer of minimum required base quality to count a base.
 * @return  results     HashMap<String, String> of results for each allele
 */
static HashMap<String, String> getReadCounts(String refBase, String readBases, String readQuals, int minAvgQual, String mapQuals)
{
    HashMap<String, Integer> readCounts = new HashMap<String, Integer>();
    HashMap<String, Integer> readCountsPlus = new HashMap<String, Integer>();
    HashMap<String, Integer> readCountsMinus = new HashMap<String, Integer>();
    HashMap<String, Integer> qualitySum = new HashMap<String, Integer>();
    HashMap<String, Integer> mapQualitySum = new HashMap<String, Integer>();
    HashMap<String, String> strandsSeen = new HashMap<String, String>();

(...) ````

ADD REPLYlink modified 3 months ago • written 3 months ago by Pierre Lindenbaum120k
1
gravatar for igor
3 months ago by
igor7.6k
United States
igor7.6k wrote:

Different variant have different rules regarding thresholds for counting reads. Additionally, these criteria may vary for different fields. VarScan offers an explanation on their website:

VarScan requires that bases meet the minimum Phred quality score (default 15 for most commands) to count them for things like read counts (reads1, reads2) and to compute variant allele frequency. However, when VarScan reports the depth (such as in the DP field of VCF output), it reports SAMtools raw depth. To get VarScan read counts to more closely match another tool, set use parameter --min-avg-qual 0. And use caution! Low-quality bases, with the occasional exception of BAQ penalties, should not be trusted.

Also, VarScan reports variants on a biallelic basis. That is, for a given SNP call, the "reads1" column is the number of reference-supporting reads (RD), and the "reads2" column is the number of variant-supporting reads (AD). There may be additional reads at that position showing other bases (SNP or indel variants). If these other variants meet the calling criteria, they will be reported in their own line. If not, it may look like you have "missing" reads.

ADD COMMENTlink written 3 months ago by igor7.6k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 676 users visited in the last hour