How to allele frequency for each variant in VCF file
0
0
Entering edit mode
3.7 years ago
Nyksubuz ▴ 10

I have a VCF file, and I need to calculate allele frequency for each variant by using code/script, I have the values for AN and AC within the file.

vcf variant genome gene perl • 3.2k views
ADD COMMENT
0
Entering edit mode

What have you tried on your own? Why do you want to use perl?

Can you give us a list of tools you've looked at that can manipulate VCF files? You've been given this advice before - google as much as you can.

ADD REPLY
0
Entering edit mode

I don't want to use a tool, i want to write a script to do the same

ADD REPLY
0
Entering edit mode

Use a tool. Do not reinvent the wheel. If you insist on scripting it yourself, read the VCF documentation and figure out how to calculate AF from AC and AN.

ADD REPLY
0
Entering edit mode

it is possible to calculate allele frequency by dividing AC/AN, I just need help with writing the script to do the same

ADD REPLY
0
Entering edit mode

What exactly have you tried? Show us your script and where you need help specifically - we are not going to do your homework for you.

ADD REPLY
0
Entering edit mode

first of all its not a home work question, im learning variant analysis, idk why you are getting so offended

ADD REPLY
0
Entering edit mode

I apologize - questions need to be more specific and show that the asker has invested effort in trying to solve things themselves, which this post lacks. Add that and we're all open to helping you.

ADD REPLY
0
Entering edit mode
#!/bin/bash
# Extract AC values
awk 'BEGIN{n=17}NR<=n{next}{ print $8 }' CEU.exon.2010_09.genotypes.vcf | sed -e s/AA=.\;AC=// | sed -e s/\;AN.*// > ac

# Extract AN values
awk 'BEGIN{n=17}NR<=n{next}{ print $8 }' CEU.exon.2010_09.genotypes.vcf | sed -e s/AA=.\;AC=[0-9]*\;AN=// | sed -e s/\;DP.*// > an

# Get AF values
paste ac an > tmp
awk '{print $1/$2}' tmp > tmp2

awk 'BEGIN{n=17}NR<=n{next}{ print $3}' CEU.exon.2010_09.genotypes.vcf > tmp3
paste tmp3 tmp2 > af
echo "Generated AF file"

rm tmp tmp2 tmp3 ac an

echo "First 10 lines"
head af

echo "WC"
wc af
ADD REPLY
0
Entering edit mode

I did this much, it's pretty long, i need modify it or find any other way to calculate AF

ADD REPLY
1
Entering edit mode

You should replace your awk with bcftools query. That will make this simple.

bcftools query -f '%INFO/AC\t%INFO/AF\n' CEU.exon.2010_09.genotypes.vcf | awk -F "\t" -vOFS="\t" {print $1,$2,$1/$2} >af.txt

Of course, the above can be used to maybe generate a histogram of AF, not map anything back to any locus, because you're literally just calculating the AF, not storing it in any meaningful way.

ADD REPLY
0
Entering edit mode

ok, thank you, that helps a lot

ADD REPLY
0
Entering edit mode

I tried using grep and awk to extract the vaues from the columns, im stuck with the division of column by column

ADD REPLY
0
Entering edit mode

awk can do division (see example below). Show us your script and an example where it fails.

➜ echo -e "4\t2" | awk -F"\t" '{print $1/$2}'
2
ADD REPLY

Login before adding your answer.

Traffic: 2294 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6