Question: How to allele frequency for each variant in VCF file
0
gravatar for Nyksubuz
8 weeks ago by
Nyksubuz0
Nyksubuz0 wrote:

I have a VCF file, and I need to calculate allele frequency for each variant by using code/script, I have the values for AN and AC within the file.

perl gene variant genome vcf • 126 views
ADD COMMENTlink modified 8 weeks ago • written 8 weeks ago by Nyksubuz0

What have you tried on your own? Why do you want to use perl?

Can you give us a list of tools you've looked at that can manipulate VCF files? You've been given this advice before - google as much as you can.

ADD REPLYlink modified 8 weeks ago • written 8 weeks ago by RamRS30k

I don't want to use a tool, i want to write a script to do the same

ADD REPLYlink written 8 weeks ago by Nyksubuz0

Use a tool. Do not reinvent the wheel. If you insist on scripting it yourself, read the VCF documentation and figure out how to calculate AF from AC and AN.

ADD REPLYlink modified 8 weeks ago • written 8 weeks ago by RamRS30k

it is possible to calculate allele frequency by dividing AC/AN, I just need help with writing the script to do the same

ADD REPLYlink written 8 weeks ago by Nyksubuz0

What exactly have you tried? Show us your script and where you need help specifically - we are not going to do your homework for you.

ADD REPLYlink modified 8 weeks ago • written 8 weeks ago by RamRS30k

first of all its not a home work question, im learning variant analysis, idk why you are getting so offended

ADD REPLYlink modified 8 weeks ago • written 8 weeks ago by Nyksubuz0

I apologize - questions need to be more specific and show that the asker has invested effort in trying to solve things themselves, which this post lacks. Add that and we're all open to helping you.

ADD REPLYlink written 8 weeks ago by RamRS30k
#!/bin/bash
# Extract AC values
awk 'BEGIN{n=17}NR<=n{next}{ print $8 }' CEU.exon.2010_09.genotypes.vcf | sed -e s/AA=.\;AC=// | sed -e s/\;AN.*// > ac

# Extract AN values
awk 'BEGIN{n=17}NR<=n{next}{ print $8 }' CEU.exon.2010_09.genotypes.vcf | sed -e s/AA=.\;AC=[0-9]*\;AN=// | sed -e s/\;DP.*// > an

# Get AF values
paste ac an > tmp
awk '{print $1/$2}' tmp > tmp2

awk 'BEGIN{n=17}NR<=n{next}{ print $3}' CEU.exon.2010_09.genotypes.vcf > tmp3
paste tmp3 tmp2 > af
echo "Generated AF file"

rm tmp tmp2 tmp3 ac an

echo "First 10 lines"
head af

echo "WC"
wc af
ADD REPLYlink modified 8 weeks ago by RamRS30k • written 8 weeks ago by Nyksubuz0

I did this much, it's pretty long, i need modify it or find any other way to calculate AF

ADD REPLYlink written 8 weeks ago by Nyksubuz0
1

You should replace your awk with bcftools query. That will make this simple.

bcftools query -f '%INFO/AC\t%INFO/AF\n' CEU.exon.2010_09.genotypes.vcf | awk -F "\t" -vOFS="\t" {print $1,$2,$1/$2} >af.txt

Of course, the above can be used to maybe generate a histogram of AF, not map anything back to any locus, because you're literally just calculating the AF, not storing it in any meaningful way.

ADD REPLYlink modified 8 weeks ago • written 8 weeks ago by RamRS30k

ok, thank you, that helps a lot

ADD REPLYlink written 8 weeks ago by Nyksubuz0

I tried using grep and awk to extract the vaues from the columns, im stuck with the division of column by column

ADD REPLYlink written 8 weeks ago by Nyksubuz0

awk can do division (see example below). Show us your script and an example where it fails.

➜ echo -e "4\t2" | awk -F"\t" '{print $1/$2}'
2
ADD REPLYlink modified 8 weeks ago • written 8 weeks ago by RamRS30k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2027 users visited in the last hour