Question: BCF Tools Filter on 1000Genomes Annotation
2
gravatar for jon.klonowski
9 months ago by
jon.klonowski70 wrote:

I am trying to filter a VCF based on the allele frequency as given by ANNOVAR annotation, which has annotations in the info column. I am doing this because I only want the rare variants for analysis. I have been trying bcf query but it is not acting well behaved....

$ bcftools view -i 'INFO/controls_AF_popmax < 0.05' File_Name.vcf.gz  | bgzip > output.vcf.gz

or

$ bcftools query File_Name.vcf.gz -f '%ID\t %INFO/Gene.ensGene\t %INFO/controls_AF_popmax\n' -i 'INFO/controls_AF_popmax < 0.05' >  output.vcf

I get output but not wat expected in terms of the filter.

I want a vcf in the end so I can make ped, map, bed, bim, fam from it using PLINK (I know how to do that part).

UPDATE: I thoguht maybe it was because of the format of the INFO/controls_AF_popmax field in the header. I tried to change the designation to the same designation AF is but that did not work either.

UPDATE: It works for 'INFO/AF > 0.05' but not for 'INFO/controls_AF_popmax > 0.05' do not know why

ADD COMMENTlink modified 9 months ago • written 9 months ago by jon.klonowski70

Hello,

could you please show us the complete header and some variants of your input file?

fin swimmer

ADD REPLYlink written 9 months ago by finswimmer13k
1

Looks like there were numerous extra info fields specified in the header that were not actually present. I just purged them, going to double check the header and try this all again.

UPDATE:

AFTER UPDATING INFO IT THE IT APPEARS TO BE WORKING. Lets see

what I changed:

  1. removed superfluous headers
  2. Changed format of the Annotation of interested to Float from String:

## INFO=<ID=controls_AF_popmax,Number=.,Type=Float,Description="controls_AF_popmax annotation provided by ANNOVAR">
ADD REPLYlink modified 8 months ago by finswimmer13k • written 9 months ago by jon.klonowski70

Hi

I had same problem on my vcf.

I also wanted to filter a VCF based on the allele frequency as given by ANNOVAR annotation. Like you said, I've changed my vcf INFO header (from "String" to "Float", only on site which I wanted to filter) and bcftools query (only) worked fine. BTW... excellent comment! Thank you very much!

From:

##INFO=<ID=ExAC_EAS,Number=.,Type=String,Description="ExAC_EAS annotation provided by ANNOVAR">

To:

##INFO=<ID=ExAC_EAS,Number=.,Type=Float,Description="ExAC_EAS annotation provided by ANNOVAR">

Command:

bcftools query input_edited.vcf -f '%CHROM\t%POS\t%ID\t%REF\t%ALT\t%INFO/ExAC_EAS\n' -i 'INFO/ExAC_EAS != "." & INFO/ExAC_EAS < 0.005'

With this command, my output is tabular.

I also tried bcftools view (to get a vcf output) and didn't work:

bcftools view -e 'ExAC_EAS="." & ExAC_EAS > 0.005' input_edited.vcf

I had a lot of errors like this: [W::vcf_parse_format]

I did some tests with 2 versions of vcf: before and after this modification. The error is still there in both cases.

I think the steps here are: use bcftools query, get ID's from variants of interest (in tabular format), use these ID's as query again against my original vcf to get variants in VCF format.

ADD REPLYlink modified 5 days ago • written 5 days ago by moneterg0

UPDATED VCF header where i deleted all the contig information and filtering information so it would fit in a single screenshot:

PREVIOUS VCF HEADER https://ibb.co/rZRtR7F

UPDATED VCF HEADER https://ibb.co/n8cWxm9

Sorry i am not sure the best way to share...

and here is a record, not including sample information:

chr1    861219  chr1:861219:G:C G   C   490.79  PASS    AC=0;AF=0.0007669;AN=922;BaseQRankSum=2.72;DP=18855;ExcessHet=3.0103;FS=3.736;InbreedingCoeff=-0.0011;MLEAC=1;MLEAF=0.0007669;MQ=60;MQRankSum=0;QD=18.88;ReadPosRankSum=0.416;SOR=2.247;VQSLOD=13.64;culprit=MQRankSum;ANNOVAR_DATE=2018-04-16;Func.refGene=intronic;Gene.refGene=SAMD11;GeneDetail.refGene=.;ExonicFunc.refGene=.;AAChange.refGene=.;Func.ensGene=intronic;Gene.ensGene=ENSG00000187634;GeneDetail.ensGene=.;ExonicFunc.ensGene=.;AAChange.ensGene=.;AF=0.0001;AF_popmax=0.0004;AF_male=0.0001;AF_female=0.0001;AF_raw=0.0001;AF_afr=0;AF_sas=0.0004;AF_amr=0;AF_eas=0;AF_nfe=0.0001;AF_fin=0;AF_asj=0.0002;AF_oth=0;non_topmed_AF_popmax=0.0004;non_neuro_AF_popmax=0.0004;non_cancer_AF_popmax=0.0004;controls_AF_popmax=0.0003;MCAP13=.;CADD13_RawScore=.;CADD13_PHRED=.;gerp++elem=.;CLNALLELEID=.;CLNDN=.;CLNDISDB=.;CLNREVSTAT=.;CLNSIG=.;ALLELE_END GT:AD:DP:GQ:PL
ADD REPLYlink modified 8 months ago by finswimmer13k • written 9 months ago by jon.klonowski70

i will try to clean up my header completely and double check the positioning, maybe the annovar annotation didnt properly align the header and that is why i am getting the issue

ADD REPLYlink written 9 months ago by jon.klonowski70
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1196 users visited in the last hour