Identify most and least abundant allele (nMAJ and nMIN) for pooled heterozygosity (hp) analysis from vcf file
1
0
Entering edit mode
3 months ago
tothepoint ▴ 600

I am trying to calculate Pooled Heterozygosity (hp) by identifying nMAJ and nMIN from vcf file with a sliding window 150kb. I am confused after reading papers where they calculated using formula but no particular method to calculate nMAJ and nMIN. Can you please share the way to calculate the same. I will be grateful to you all.

Hp = 2ΣnMAJΣnMIN/(ΣnMAJ + ΣnMIN)2

Thankyou

selection wgs vcf gatk • 229 views
ADD COMMENT
0
Entering edit mode
3 months ago
GenomeXP • 0

Hi Devarora,

As far as I understand, you have to find, for every SNP the most common allele (nMAJ). You sum the count of all these major alleles in your 150 kb window. Same goes for the least common allele at every SNP (nMIN). again, you sum them in the window.

You can extract the count of alleles from the vcf (https://www.internationalgenome.org/wiki/Analysis/vcf4.0/). For this, you will probably need to program something in bash, awk, perl, python, to extract the right column and retrieve the numbers that you need to sum.

Best,

Guenole

ADD COMMENT

Login before adding your answer.

Traffic: 2205 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6