I am doing fitering (quality control) of a VCF file from exome sequencing.
I already filtered by the VQSR threshold and kept only PASS loci.
When I was trying to decide the cutoff of "DP" to use, I encounter this strange case: In a few variants in one gene, I observe very different distributions of DP of the 0/0 genotype and 0/1 genotype shown below.
I was expecting a normal distribution for both genotypes. But I am observing a lot of 0/0s at "low DP" (around 30 to 50) here. And in the 0/1 figure, it seems like that we need a DP at least 80 to call a 0/1.
When I look at another gene, similar situation exists, but the threshold looks a little bit different.
My questions are:
- How does the DP affects variant calling exactly?
- Are those 0/0 with 30 to 50 DP potential false negative calls? i.e. some of them are "actually" 0/1 and they do not have deep enough DP for them to be called 0/1.
- What is an ideal cutoff of DP to use, especially for 0/0?