FMT/DP of missing genotypes
0
0
Entering edit mode
3.2 years ago

I have a VCF file where the FMT/DP of missing genotypes has been recorded itself as missing data (".")

When I filter this file in bcftools, such that only sites where at least 90% of samples have a depth >= 10, I am returned 8940 sites.

bcftools filter unfiltered.bcf.gz -i 'F_PASS(FMT/DP  >= 10) >= 0.9' -Ob -o filtered.bcf.gz
bcftools view filtered.bcf.gz -H | wc -l #8940 sites

When I perform an equivalent filtering operation in VariantAnnotation in R, I get different results depending on whether I replace this "." with zeros, or remove missing data from the calculation, and neither value is 8940:

data <- readVcf(file="unfiltered.vcf.bgz")
depth <- geno(data)$DP

# replace NAs with 0
depth[is.na(depth)] <- 0 
DP0.1 <- apply(depth,1,quantile,p=0.1)
DP_filter <- (DP0.1>=10)
sum(DP_filter)  #8953

# don't replace NAs with 0, but remove from quantile calculation
depth <- geno(data)$DP
DP0.1 <- apply(depth,1,quantile,p=0.1,na.rm=T)
DP_filter <- (DP0.1>=10)
sum(DP_filter)  #9196

I am wondering how bcftools parses the missing values in this case, since it does not conform to either of the options I can simulate in R.

Relatedly, is there a way to edit the FMT/DP attribute of specific genotypes and/or, change the way bcftools treats missing data when operating on FMT/DP information? I.e., what I would really like to do is replace the depth of all missing genotypes with 0s.

Thanks!

SNP filtering bcftools VariantAnnotation depth • 809 views
ADD COMMENT

Login before adding your answer.

Traffic: 1569 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6