Question: BCF Tools Filter on 1000Genomes Annotation
gravatar for jon.klonowski
21 months ago by
jon.klonowski70 wrote:

I am trying to filter a VCF based on the allele frequency as given by ANNOVAR annotation, which has annotations in the info column. I am doing this because I only want the rare variants for analysis. I have been trying bcf query but it is not acting well behaved....

$ bcftools view -i 'INFO/controls_AF_popmax < 0.05' File_Name.vcf.gz  | bgzip > output.vcf.gz


$ bcftools query File_Name.vcf.gz -f '%ID\t %INFO/Gene.ensGene\t %INFO/controls_AF_popmax\n' -i 'INFO/controls_AF_popmax < 0.05' >  output.vcf

I get output but not wat expected in terms of the filter.

I want a vcf in the end so I can make ped, map, bed, bim, fam from it using PLINK (I know how to do that part).

UPDATE: I thoguht maybe it was because of the format of the INFO/controls_AF_popmax field in the header. I tried to change the designation to the same designation AF is but that did not work either.

UPDATE: It works for 'INFO/AF > 0.05' but not for 'INFO/controls_AF_popmax > 0.05' do not know why

ADD COMMENTlink modified 21 months ago • written 21 months ago by jon.klonowski70


could you please show us the complete header and some variants of your input file?

fin swimmer

ADD REPLYlink written 21 months ago by finswimmer14k

Looks like there were numerous extra info fields specified in the header that were not actually present. I just purged them, going to double check the header and try this all again.



what I changed:

  1. removed superfluous headers
  2. Changed format of the Annotation of interested to Float from String:

## INFO=<ID=controls_AF_popmax,Number=.,Type=Float,Description="controls_AF_popmax annotation provided by ANNOVAR">
ADD REPLYlink modified 21 months ago by finswimmer14k • written 21 months ago by jon.klonowski70


I had same problem on my vcf.

I also wanted to filter a VCF based on the allele frequency as given by ANNOVAR annotation. Like you said, I've changed my vcf INFO header (from "String" to "Float", only on site which I wanted to filter) and bcftools query (only) worked fine. BTW... excellent comment! Thank you very much!


##INFO=<ID=ExAC_EAS,Number=.,Type=String,Description="ExAC_EAS annotation provided by ANNOVAR">


##INFO=<ID=ExAC_EAS,Number=.,Type=Float,Description="ExAC_EAS annotation provided by ANNOVAR">


bcftools query input_edited.vcf -f '%CHROM\t%POS\t%ID\t%REF\t%ALT\t%INFO/ExAC_EAS\n' -i 'INFO/ExAC_EAS != "." & INFO/ExAC_EAS < 0.005'

With this command, my output is tabular.

I also tried bcftools view (to get a vcf output) and didn't work:

bcftools view -e 'ExAC_EAS="." & ExAC_EAS > 0.005' input_edited.vcf

I had a lot of errors like this: [W::vcf_parse_format]

I did some tests with 2 versions of vcf: before and after this modification. The error is still there in both cases.

I think the steps here are: use bcftools query, get ID's from variants of interest (in tabular format), use these ID's as query again against my original vcf to get variants in VCF format.

ADD REPLYlink modified 12 months ago • written 12 months ago by Monete R. Gomes0

UPDATED VCF header where i deleted all the contig information and filtering information so it would fit in a single screenshot:



Sorry i am not sure the best way to share...

and here is a record, not including sample information:

chr1    861219  chr1:861219:G:C G   C   490.79  PASS    AC=0;AF=0.0007669;AN=922;BaseQRankSum=2.72;DP=18855;ExcessHet=3.0103;FS=3.736;InbreedingCoeff=-0.0011;MLEAC=1;MLEAF=0.0007669;MQ=60;MQRankSum=0;QD=18.88;ReadPosRankSum=0.416;SOR=2.247;VQSLOD=13.64;culprit=MQRankSum;ANNOVAR_DATE=2018-04-16;Func.refGene=intronic;Gene.refGene=SAMD11;GeneDetail.refGene=.;ExonicFunc.refGene=.;AAChange.refGene=.;Func.ensGene=intronic;Gene.ensGene=ENSG00000187634;GeneDetail.ensGene=.;ExonicFunc.ensGene=.;AAChange.ensGene=.;AF=0.0001;AF_popmax=0.0004;AF_male=0.0001;AF_female=0.0001;AF_raw=0.0001;AF_afr=0;AF_sas=0.0004;AF_amr=0;AF_eas=0;AF_nfe=0.0001;AF_fin=0;AF_asj=0.0002;AF_oth=0;non_topmed_AF_popmax=0.0004;non_neuro_AF_popmax=0.0004;non_cancer_AF_popmax=0.0004;controls_AF_popmax=0.0003;MCAP13=.;CADD13_RawScore=.;CADD13_PHRED=.;gerp++elem=.;CLNALLELEID=.;CLNDN=.;CLNDISDB=.;CLNREVSTAT=.;CLNSIG=.;ALLELE_END GT:AD:DP:GQ:PL
ADD REPLYlink modified 21 months ago by finswimmer14k • written 21 months ago by jon.klonowski70

i will try to clean up my header completely and double check the positioning, maybe the annovar annotation didnt properly align the header and that is why i am getting the issue

ADD REPLYlink written 21 months ago by jon.klonowski70
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1016 users visited in the last hour