bcftools filter doesn't filter variants based on expression
1
0
Entering edit mode
4.5 years ago
Adamo0_91 ▴ 20

Hello,

I'm trying to filter rows from my vcf file based on the condition. I'd like to have only variants where alternate allele is present in at least in 25% of the reads (alternate allele frequency). Usually I'm using SnpSift filter but I'm unable to get to the FORMAT fileds of vcf file by this tool, hence I choose bcftool and unfortunaately I'm not familiar with it... What I did? Based on vignette I've created index and zip the file:

bgzip -c input.vcf > input.vcf.gz
bcftools index input.vcf.gz

Then I tried to filter:

bcftools filter -i 'FORMAT/AD[0:1]*100/(FORMAT/AD[0:0]+FORMAT/AD[0:1]) >= 25' input.vcf.gz > output.vcf

I understand this as follows

FORMAT/AD[0:1] - number of ALT alleles
FORMAT/AD[0:0] - number of REF alleles

Of course I could use the DP instead of FORMAT/AD[0:0]+FORMAT/AD[0:1] however variants have been called using GATK and AD is already filtered in contrast to DP, what is more DP contains all read, I mean all possible alleles and AD only ref's and alt's.

The problem is... When I run this command many of variants are actually filtered, however, when I'm manualy checking the frequency, many of the variants have this procentage value below 25. I don't know why, I think that is some stupid mistake that I can't catch. What do you think about this command?

vcf • 2.4k views
ADD COMMENT
2
Entering edit mode
4.5 years ago
Adamo0_91 ▴ 20

I got it now... Brackets...

bcftools filter -i '(FORMAT/AD[0:1]*100)/(FORMAT/AD[0:0]+FORMAT/AD[0:1]) >= 25' input.vcf.gz > output.vcf
ADD COMMENT

Login before adding your answer.

Traffic: 2090 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6