As Rubin et al, one method of selection signature identification in a genome-scale study is pooled heterozygosity (Hp) calculation.
“Hp = 2ƩnMAJƩnMIN/( ƩnMAJ + ƩnMIN)^2, where nMAJ and nMIN are the numbers of reads corresponding to the most and least abundant allele, respectively, the sum of theses parameters calculated for SNPs in a defined window (40kb for example) across the genome. Then Hp is Z-transformed.
Many articles use this method but unfortunately, I cannot find any command to do this in the supplementary materials. Also, Google did not answer my question after a lot of searching.
My question here is actually the third question on this issue, but the previous two questions have unfortunately not been answered. I hope this time with your help I can find an answer to this question.
How can I get nMAJ and nMIN from a multi-sample VCF file (produced by GATK) and then calculate Hp?
Thanks in advance
Hi, I'm having the exactly same issue here. I want to calculate the pooled He using Rubin's method. I have tried to find the script how to calculate it in many paper but could not find any. Now I am trying to write a function to calculate it, but my concern is whether I'm doing it properly.
I wonder how did you cope with it in the end. Thank you anyways posting this question so that I know I'm not the only one having the problem.