Question: Calculating Heterozygosity for each SNPs.
gravatar for mostafarafiepour
21 months ago by
mostafarafiepour80 wrote:

Hi all dear,

I want to calculating heterozygosity for each SNPs. After studying the plink guide, I have calculated the heterozygosity using the following script.

plink  --make-bed --file purebred411_qc --freqx --out freqx_411

And I got this output:

CHR SNP         A1  A2  C(HOM A1)   C(HET)  C(HOM A2)   C(HAP A1)   C(HAP A2)   C(MISSING)
1   AX-85111653 3   1   45           187     178         0          0            1
1   AX-85043398 2   4   45           186     180         0          0            0
1   AX-85051079 4   2   5            71      335         0          0            0
1   AX-85154093 4   2   5            72      332         0          0            2
1   AX-85063459 3   1   56           199     155         0          0            1

So, First, I want to know if I correctly calculated the heterozygosity value for each SNPs?

Second, if done correctly, how can I calculate the percentage of heterozygosity of each SNPs?

Best Regard


snp • 1.2k views
ADD COMMENTlink modified 21 months ago by Kevin Blighe65k • written 21 months ago by mostafarafiepour80
gravatar for Kevin Blighe
21 months ago by
Kevin Blighe65k
Kevin Blighe65k wrote:

Hello Mostafa,

Here is what the plink manual states:

Allele frequency

  • --freq < counts | case-control > < gz >
  • --freqx <gz> (alias: --frqx)

By itself, --freq writes a minor allele frequency report to plink.frq. If you add the 'counts' modifier, an allele count report is written to plink.frq.count instead. Alternatively, you can use --freq with --within/--family to write a cluster-stratified frequency report to plink.frq.strat, or use the 'case-control' modifier to write a case/control phenotype-stratified report to

--freqx writes a more informative genotype count report to plink.frqx.

For both flags, gzipped output can be requested with the 'gz' modifier.

Nonfounders are normally excluded from these counts/frequencies; use --nonfounders to change this.

All of these reports (except for --freq + --within/--family) are valid input for --read-freq; --freqx is the most powerful when used in that capacity, since it preserves deviation from Hardy-Weinberg equilibrium.



You used --freqx. Here is a description of the output:

.frqx (genotype count report)

Produced by --freqx. Valid input for --read-freq.

A text file with a header line, and then one line per variant with the following ten fields:

  • CHR Chromosome code
  • SNP Variant identifier
  • A1 Allele 1 (usually minor)
  • A2 Allele 2 (usually major)
  • C(HOM A1) A1 homozygote count
  • C(HET) Heterozygote count
  • C(HOM A2) A2 homozygote count
  • C(HAP A1) Haploid A1 count (includes male X chromosome)
  • C(HAP A2) Haploid A2 count
  • C(MISSING) Missing genotype count



Final piece of information: it looks like your bases are encoded in 1,2,3,4 format (A,C,G,T == 1,2,3,4).

So, now you should understand your output and, I believe, you will know whether or not you have chosen the correct program / command.


ADD COMMENTlink written 21 months ago by Kevin Blighe65k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1897 users visited in the last hour