Use bcftools to filter a VCF with multiple filter flags
17 months ago
Ram 35k

Good afternoon,

I'm trying to filter a VCF file that has the following dummy flag values:

• PASS: All filters passed
• Fa: Failed filter a
• Fb: Failed filter b
• Fc: Failed filter c
• Fd: Faield filter d

Variants can fail one or more filters. Variants that fail multiple filters will be annotated with the corresponding flags separated by semi-colon. Thus, the filter column can have one of the 5 above values, or any number of F* values separated by ;.

I'd like to select all variants that either PASSed or only failed filter a. How can I do this in bcftools? The -f option skips location that does not contain one of the listed filters, so it keeps locations that contain any of the listed filters. When I use

bcftools view -f PASS,Fa ...


I get rows that failed filter a along with other filters also. That is, the above expression matches both Fa and Fa;Fb. I tried excluding the delimiter, but that didn't work:

bcftools view -f 'PASS,Fa,;' ... #didn't work


Does anyone know how to exclude or include exactly a list of filters? Nothing in the -i or -e EXPRESSIONS is useful either.

This is what I'm using right now, which is awk mocking bcftools:

zcat vcf_file.vcf.gz | awk -F"\t" -vOFS="\t" '$0 ~ /^#/ {print}$7=="PASS" || \$7=="Fa" {print}'

Update: I tried this, but it picked up an entry that it was not supposed to pick up:

bcftools view -i 'FILTER=="PASS" | FILTER=="Fa"' vcf_file.gz
bcftools view -i 'FILTER=="PASS" || FILTER=="Fa"' vcf_file.gz


It picked up an entry where the FILTER value was Fa;Fb.

Update #2: I've opened an issue on bcftools github: https://github.com/samtools/bcftools/issues/1285

17 months ago
Ram 35k

This feature was lacking in bcftools, and the developer has now fixed that with this commit: https://github.com/samtools/bcftools/commit/fea8773196878481399183fd9f711685d41e6cf9

Starting bcftools v1.10.3, it should be possible to do this sort of exact filtering.