As Rubin et al, one method of selection signature identification in a genome-scale study is pooled heterozygosity (Hp) calculation.

“Hp = 2ƩnMAJƩnMIN/( ƩnMAJ + ƩnMIN)^2, where nMAJ and nMIN are the numbers of reads corresponding to the most and least abundant allele, respectively, the sum of theses parameters calculated for SNPs in a defined window (40kb for example) across the genome. Then Hp is Z-transformed.

Many articles use this method but unfortunately, I cannot find any command to do this in the supplementary materials. Also, Google did not answer my question after a lot of searching.

My question here is actually the third question on this issue, but the previous two questions have unfortunately not been answered. I hope this time with your help I can find an answer to this question.

How can I get nMAJ and nMIN from a multi-sample VCF file (produced by GATK) and then calculate Hp?

Thanks in advance

Hi, I'm having the exactly same issue here. I want to calculate the pooled

Heusing Rubin's method. I have tried to find the script how to calculate it in many paper but could not find any. Now I am trying to write a function to calculate it, but my concern is whether I'm doing it properly.I wonder how did you cope with it in the end. Thank you anyways posting this question so that I know I'm not the only one having the problem.