Question: bcftools query for multi-allelic cases
gravatar for tiplud
2.9 years ago by
tiplud0 wrote:

Hi Everyone, I have a question regarding how to efficiently obtain the output using bcftools query when there is a multi-allelic case. I have a variable test_variants which contains the following variant information : START\tPOS\tREF\tALT. Two example lines are :

1   16894472    A   G
1   16894476    G   C

I would like to get the dbSNP id and Allele Frequency for these entries from the gnomAD exome vcf file ( $vcf_exome ). When I run the following command :

bcftools view -O v -R <(echo "$test_variants") "$vcf_exome"  | grep -Ef <(awk 'BEGIN{FS=OFS="\t";print "#"};{print "^"$1,$2,"[^\t]+",$3}' <(echo "$test_variants")) | bcftools query -f '%CHROM\t%POS\t%REF\t%ALT\t%ID\t%AF\n'
the output I get is : 
1   16894472    A   G,C rs763081531 2,0 8.5764e-06,0

1   16894476    G   C,T,A   rs4661811   1,33,1  4.32436e-06,0.000142704,4.32436e-06

Is there a way in bcftools to output the information for my only ALT allele ? I feel I need to post-process the output with awk and get the index of the matching ALT allele from the 4th column of the output. In such a case, I am not sure how to do this in a batch way. Thank you! Debayan

snp next-gen • 1.3k views
ADD COMMENTlink modified 2.9 years ago by Pierre Lindenbaum126k • written 2.9 years ago by tiplud0
gravatar for Pierre Lindenbaum
2.9 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum126k wrote:

using awk

echo -e  "1,16894472,A,G\n1,16894476,G,T\n1,16894476,G,C" | while IFS="," read -ra T; do  bcftools view "${T[0]}.vcf.gz" "${T[0]}:${T[1]}-${T[1]}" | awk -F '\t' -v P=${T[1]} -v R=${T[2]} -v A=${T[3]} '/^#/ {next;} {if($2!=P || $4!=R) next;nalts=split($5,alts,/,/);for(i=1;i<=nalts;++i) {if(A!=alts[i]) continue;  natts=split($8,atts,/;/);for(j=1;j<=natts;++j) {if(substr(atts[j],1,3)!="AF=") continue;nafs=split(substr(atts[j],4),afs,/,/); print $1,$2,$3,$4,A,afs[i];}} }' ; done

1 16894472 rs763081531 A G 0.000424418
1 16894476 rs4661811 G T 0.000109649
1 16894476 rs4661811 G C 0.000584795
ADD COMMENTlink written 2.9 years ago by Pierre Lindenbaum126k

Amazing ! Works perfect ! Thanks a lot :)

ADD REPLYlink written 2.9 years ago by tiplud0

it if answers the question, please, check the green mark on the left to close the question.

ADD REPLYlink written 2.9 years ago by Pierre Lindenbaum126k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1520 users visited in the last hour