Question: Calculating Heterozygosity for each SNPs.
gravatar for mostafarafiepour
2.2 years ago by
mostafarafiepour110 wrote:

Hi all dear,

I want to calculating heterozygosity for each SNPs. After studying the plink guide, I have calculated the heterozygosity using the following script.

plink  --make-bed --file purebred411_qc --freqx --out freqx_411

And I got this output:

CHR SNP         A1  A2  C(HOM A1)   C(HET)  C(HOM A2)   C(HAP A1)   C(HAP A2)   C(MISSING)
1   AX-85111653 3   1   45           187     178         0          0            1
1   AX-85043398 2   4   45           186     180         0          0            0
1   AX-85051079 4   2   5            71      335         0          0            0
1   AX-85154093 4   2   5            72      332         0          0            2
1   AX-85063459 3   1   56           199     155         0          0            1

So, First, I want to know if I correctly calculated the heterozygosity value for each SNPs?

Second, if done correctly, how can I calculate the percentage of heterozygosity of each SNPs?

Best Regard


snp • 1.6k views
ADD COMMENTlink modified 2.2 years ago by Kevin Blighe70k • written 2.2 years ago by mostafarafiepour110
gravatar for Kevin Blighe
2.2 years ago by
Kevin Blighe70k
Republic of Ireland
Kevin Blighe70k wrote:

Hello Mostafa,

Here is what the plink manual states:

Allele frequency

  • --freq < counts | case-control > < gz >
  • --freqx <gz> (alias: --frqx)

By itself, --freq writes a minor allele frequency report to plink.frq. If you add the 'counts' modifier, an allele count report is written to plink.frq.count instead. Alternatively, you can use --freq with --within/--family to write a cluster-stratified frequency report to plink.frq.strat, or use the 'case-control' modifier to write a case/control phenotype-stratified report to

--freqx writes a more informative genotype count report to plink.frqx.

For both flags, gzipped output can be requested with the 'gz' modifier.

Nonfounders are normally excluded from these counts/frequencies; use --nonfounders to change this.

All of these reports (except for --freq + --within/--family) are valid input for --read-freq; --freqx is the most powerful when used in that capacity, since it preserves deviation from Hardy-Weinberg equilibrium.



You used --freqx. Here is a description of the output:

.frqx (genotype count report)

Produced by --freqx. Valid input for --read-freq.

A text file with a header line, and then one line per variant with the following ten fields:

  • CHR Chromosome code
  • SNP Variant identifier
  • A1 Allele 1 (usually minor)
  • A2 Allele 2 (usually major)
  • C(HOM A1) A1 homozygote count
  • C(HET) Heterozygote count
  • C(HOM A2) A2 homozygote count
  • C(HAP A1) Haploid A1 count (includes male X chromosome)
  • C(HAP A2) Haploid A2 count
  • C(MISSING) Missing genotype count



Final piece of information: it looks like your bases are encoded in 1,2,3,4 format (A,C,G,T == 1,2,3,4).

So, now you should understand your output and, I believe, you will know whether or not you have chosen the correct program / command.


ADD COMMENTlink written 2.2 years ago by Kevin Blighe70k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1578 users visited in the last hour