filter VCF based on info field that has a key but no value
1
1
Entering edit mode
4.2 years ago
curious ▴ 750

I have some INFO fields that look like this:

##INFO=<ID=AF,Number=1,Type=Float,Description="Estimated Alternate Allele Frequency">
##INFO=<ID=MAF,Number=1,Type=Float,Description="Estimated Minor Allele Frequency">
##INFO=<ID=R2,Number=1,Type=Float,Description="Estimated Imputation Accuracy (R-square)">
##INFO=<ID=ER2,Number=1,Type=Float,Description="Empirical (Leave-One-Out) R-square (available only for genotyped variants)">
##INFO=<ID=IMPUTED,Number=0,Type=Flag,Description="Marker was imputed but NOT genotyped">
##INFO=<ID=TYPED,Number=0,Type=Flag,Description="Marker was genotyped AND imputed">
##INFO=<ID=TYPED_ONLY,Number=0,Type=Flag,Description="Marker was genotyped but NOT imputed">

I understand that if I wanted to filter based on an INFO col key-value pair I could go:

bcftools view -e 'R2<0.9' my_fav.vcf.gz

The IMPUTED, TYPED, TYPED_ONLY keys appear in the info field with no corresponding value, for example:

AF=0.00016;MAF=0.03036;R2=0.13409;IMPUTED

is a complete info field.

Is there a way one could filter TYPED vs IMPUTED variants by using bcftools utility. I know I could probably grep my way through this if need be.

bcftools vcf • 2.4k views
ADD COMMENT
0
Entering edit mode

Interesting question! Can you try if an aggregate function (such as COUNT(IMPUTED)>0) picks up these entries?

ADD REPLY
0
Entering edit mode
COUNT() can be applied only on FORMAT fields

it seems :(. . This valueless key is a standard output of Minimac imputation software, I wish it was a value of some key though

ADD REPLY
0
Entering edit mode

Have you tried:

bcftools view -e 'IMPUTED' my_fav.vcf.gz

or

bcftools view -i 'IMPUTED' my_fav.vcf.gz

ADD REPLY
0
Entering edit mode

This was the first thing I tried before posting. when it did not work I was basically stuck because I couldn't see how to do it outside of bcftools. I just like using bcftools because it helps me catch corrupted VCFs sometimes

ADD REPLY
0
Entering edit mode
3.7 years ago
caro1002 • 0

You can use the expression =1 (or 0) to test the presence (or absence) of a flag. For example, the following expressions will exclude all imputed SNPs

bcftools view -e 'INFO/IMPUTED=1' or bcftools view -i 'INFO/IMPUTED=0'

ADD COMMENT

Login before adding your answer.

Traffic: 1507 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6