How To Interpret Dp Fields For Samples In Vcf Files?
1
2
Entering edit mode
10.9 years ago
Luca Beltrame ▴ 240

I'm doing some matched comparison of samples, and I'm trying to filter the results by depth. However, I'm not sure on how to use the DP field per sample.

Let's make an example: suppose we have matched Sample A and Sample B, and at a particular locus we have a mutation (SNP).

Case 1

  • DP for Sample A reports 10
  • DP for Sample B reports 15

Case 2

  • DP for Sample A reports 10
  • DP for Sample B reports none (no DP in genotype)

My problem is how to interpret Case 2 (and similar scenarios, e.g. with Sample A with no DP). Given that DP in samples (at least the ones used by the GATK) are reads that pass the quality control metrics, which scenarios are most likely here?

  1. Nothing can be done, the locus for that specific sample may be wild type or not but filtered read depth is not sufficient to determine that (in R terms, this would mean NA)
  2. The locus is assumed wild type due to lack of supporting information (reads)
  3. A wild type locus does not have DP information

This matters to me because I'm currently filtering matched samples where DP is both present and higher than a threshold, and I was wondering if I wasn't too restrictive.

For reference, these results refer to indels generated with the GATK's UnifiedGenotyper in indel mode.

vcf sequencing • 3.8k views
ADD COMMENT
1
Entering edit mode

You could check the pileup at that particular locus just to make sure that the issue is from lack of reads spanning the particular genomic location in that sample. If that is the case, I presume you cannot make a direct comparison for this SNP between the two samples.

ADD REPLY
1
Entering edit mode
10.9 years ago

Do you have your samples in one .vcf file or seperate .vcf files? I am going to assume it's all in one .vcf file. In practical terms, scenario 1 and 3 are the same. Whether you have 0 or 2 reads from which nothing can be concluded, the result is the same. If your genotype is shown as ./. , that means it hasn't been called at all.

I have not heard of scenario 2 happen. I've been using GATK only recently but I don't think it happens.

Hope this helps.

ADD COMMENT

Login before adding your answer.

Traffic: 3146 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6