Question: bcftools filter doesn't filter variants based on expression
0
gravatar for Adamo0_91
11 months ago by
Adamo0_9110
Adamo0_9110 wrote:

Hello,

I'm trying to filter rows from my vcf file based on the condition. I'd like to have only variants where alternate allele is present in at least in 25% of the reads (alternate allele frequency). Usually I'm using SnpSift filter but I'm unable to get to the FORMAT fileds of vcf file by this tool, hence I choose bcftool and unfortunaately I'm not familiar with it... What I did? Based on vignette I've created index and zip the file:

bgzip -c input.vcf > input.vcf.gz
bcftools index input.vcf.gz

Then I tried to filter:

bcftools filter -i 'FORMAT/AD[0:1]*100/(FORMAT/AD[0:0]+FORMAT/AD[0:1]) >= 25' input.vcf.gz > output.vcf

I understand this as follows

FORMAT/AD[0:1] - number of ALT alleles
FORMAT/AD[0:0] - number of REF alleles

Of course I could use the DP instead of FORMAT/AD[0:0]+FORMAT/AD[0:1] however variants have been called using GATK and AD is already filtered in contrast to DP, what is more DP contains all read, I mean all possible alleles and AD only ref's and alt's.

The problem is... When I run this command many of variants are actually filtered, however, when I'm manualy checking the frequency, many of the variants have this procentage value below 25. I don't know why, I think that is some stupid mistake that I can't catch. What do you think about this command?

vcf • 647 views
ADD COMMENTlink written 11 months ago by Adamo0_9110
1
gravatar for Adamo0_91
11 months ago by
Adamo0_9110
Adamo0_9110 wrote:

I got it now... Brackets...

bcftools filter -i '(FORMAT/AD[0:1]*100)/(FORMAT/AD[0:0]+FORMAT/AD[0:1]) >= 25' input.vcf.gz > output.vcf
ADD COMMENTlink written 11 months ago by Adamo0_9110
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1654 users visited in the last hour