FMT/DP of missing genotypes
Entering edit mode
2.6 years ago

I have a VCF file where the FMT/DP of missing genotypes has been recorded itself as missing data (".")

When I filter this file in bcftools, such that only sites where at least 90% of samples have a depth >= 10, I am returned 8940 sites.

bcftools filter unfiltered.bcf.gz -i 'F_PASS(FMT/DP  >= 10) >= 0.9' -Ob -o filtered.bcf.gz
bcftools view filtered.bcf.gz -H | wc -l #8940 sites

When I perform an equivalent filtering operation in VariantAnnotation in R, I get different results depending on whether I replace this "." with zeros, or remove missing data from the calculation, and neither value is 8940:

data <- readVcf(file="unfiltered.vcf.bgz")
depth <- geno(data)$DP

# replace NAs with 0
depth[] <- 0 
DP0.1 <- apply(depth,1,quantile,p=0.1)
DP_filter <- (DP0.1>=10)
sum(DP_filter)  #8953

# don't replace NAs with 0, but remove from quantile calculation
depth <- geno(data)$DP
DP0.1 <- apply(depth,1,quantile,p=0.1,na.rm=T)
DP_filter <- (DP0.1>=10)
sum(DP_filter)  #9196

I am wondering how bcftools parses the missing values in this case, since it does not conform to either of the options I can simulate in R.

Relatedly, is there a way to edit the FMT/DP attribute of specific genotypes and/or, change the way bcftools treats missing data when operating on FMT/DP information? I.e., what I would really like to do is replace the depth of all missing genotypes with 0s.


SNP filtering bcftools VariantAnnotation depth • 665 views

Login before adding your answer.

Traffic: 1530 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6