Entering edit mode
5 days ago
qwzhang0601
▴
80
Hello:
I am trying to do filtering to reduce FP variants from a single sample vcf file. There are some known challenging genes in the panel, so for the challenging regions I will filter less and for other regions I will add more filtering prameters. I have prepared the scripts in three steps to get the final vcf. I wonder whether there is a way to add "regions" constraint into the filtering expression, so I can get the final vcf with only one step? Like this $ bcftools filter -e '(FS>50 | FMT/AF[0:0] < 0.15) & TYPE="snp" & regions NOT_IN ${file_challenge_regions}'
Below is my current script.
#include regions with challenge for variant calling, where we will apply less filtering
file_challenge_regions=challenge_regions.bed
#step 1 (for regions without challenge): first filter variants with DP< 15, then for SNP not in "challenge_regions.bed" filter those with FS>50 or AF<0.15
$ bcftools filter -e 'FMT/DP[0] < 15' normalized.vcf.gz | bcftools filter -e '(FS>50 | FMT/AF[0:0] < 0.15) & TYPE="snp"' -T ^${file_challenge_regions} -Ov -o nonchallenge.final.vcf
#step 2 (for challenge regions): first filter variants with DP< 15
$ bcftools filter -e 'FMT/DP[0] < 15' -T ${file_challenge_regions} normalized.vcf.gz -Ov -o challenge.final.vcf
#step 3: combine variants from challenge regions and nonchallenge regions.
time bcftools contact -a --rm-dups=none -Ov -o final.vcf nonchallenge.final.vcf challenge.final.vcf
Thanks