Identify most and least abundant allele (nMAJ and nMIN) for pooled heterozygosity (hp) analysis from vcf file
1
0
Entering edit mode
2.8 years ago
tothepoint ▴ 800

I am trying to calculate Pooled Heterozygosity (hp) by identifying nMAJ and nMIN from vcf file with a sliding window 150kb. I am confused after reading papers where they calculated using formula but no particular method to calculate nMAJ and nMIN. Can you please share the way to calculate the same. I will be grateful to you all.

Hp = 2ΣnMAJΣnMIN/(ΣnMAJ + ΣnMIN)2

Thankyou

selection wgs vcf gatk • 1.0k views
ADD COMMENT
0
Entering edit mode

Hi to thepoint

I have same question. It's been about three months since you asked the question. If you have received an answer during this time, please share it with us so that we can use it as well.

Thankful

ADD REPLY
0
Entering edit mode
2.8 years ago
GenomeXP • 0

Hi Devarora,

As far as I understand, you have to find, for every SNP the most common allele (nMAJ). You sum the count of all these major alleles in your 150 kb window. Same goes for the least common allele at every SNP (nMIN). again, you sum them in the window.

You can extract the count of alleles from the vcf (https://www.internationalgenome.org/wiki/Analysis/vcf4.0/). For this, you will probably need to program something in bash, awk, perl, python, to extract the right column and retrieve the numbers that you need to sum.

Best,

Guenole

ADD COMMENT

Login before adding your answer.

Traffic: 3039 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6