FMT/DP of missing genotypes
0
0
Entering edit mode
10 weeks ago

I have a VCF file where the FMT/DP of missing genotypes has been recorded itself as missing data (".")

When I filter this file in bcftools, such that only sites where at least 90% of samples have a depth >= 10, I am returned 8940 sites.

bcftools filter unfiltered.bcf.gz -i 'F_PASS(FMT/DP  >= 10) >= 0.9' -Ob -o filtered.bcf.gz
bcftools view filtered.bcf.gz -H | wc -l #8940 sites


When I perform an equivalent filtering operation in VariantAnnotation in R, I get different results depending on whether I replace this "." with zeros, or remove missing data from the calculation, and neither value is 8940:

data <- readVcf(file="unfiltered.vcf.bgz")
depth <- geno(data)$DP # replace NAs with 0 depth[is.na(depth)] <- 0 DP0.1 <- apply(depth,1,quantile,p=0.1) DP_filter <- (DP0.1>=10) sum(DP_filter) #8953 # don't replace NAs with 0, but remove from quantile calculation depth <- geno(data)$DP
DP0.1 <- apply(depth,1,quantile,p=0.1,na.rm=T)
DP_filter <- (DP0.1>=10)
sum(DP_filter)  #9196


I am wondering how bcftools parses the missing values in this case, since it does not conform to either of the options I can simulate in R.

Relatedly, is there a way to edit the FMT/DP attribute of specific genotypes and/or, change the way bcftools treats missing data when operating on FMT/DP information? I.e., what I would really like to do is replace the depth of all missing genotypes with 0s.

Thanks!

SNP filtering bcftools VariantAnnotation depth • 160 views