Finding The Number Of Reads Which Support A Variant In Vcf
1
1
Entering edit mode
9.3 years ago
Luca Beltrame ▴ 240

In my group we had a number of "problematic" sequencing runs, so I was asked to ensure that the variants outputted by my analyses were sufficiently covered and within the limits of the sensitivity of the validation instrument (5%) to ensure a correct validation.

Upon looking at my VCFs and the spec, though, I noticed that the DP field for each sample in a multisample VCF reports all the reads found, regardless if they are tge reference or the alternate base(s). The GATK's VCFs have the AD field, but it is not recommended, at least according to their documentation, to use them because it includes unfiltered reads.

Considering that I have full access to all the files generated for the analysis, what's the best course of action ot extract coverage for the reference and the variant given one site in the VCF file?

vcf sequencing variant-calling depth-of-coverage • 7.0k views
1
Entering edit mode

Isn't there a DP4 field in the vcf showing read coverage for ref/alt on both strands (that makes 4 numbers). But, for some reason, the four numbers do not necessarily add up to the DP field, maybe some filtered reads don't count?

1
Entering edit mode

Correct. DP is not filtered, DP4 is.

0
Entering edit mode

Yes, DP4 should be good if you want allele counts aggregated across all samples. If you want this broken down per sample, GATK's AD field is the only out-of-the-box solution I know (as far as I know, samtools doesn't do anything similar).

2
Entering edit mode
9.3 years ago

When you say "I have full access to all the files generated for the analysis", does that include the bam files? If so, you can take the sites you're interested in and pull readcounts from the bam with something like bam-readcount

0
Entering edit mode

Exactly, I meant that I have BAM files available. In fact, the suggested tool does exactly what I want. Thanks!