Question: Calculating Heterozygosity for each SNPs.
0
gravatar for mostafarafiepour
3 months ago by
mostafarafiepour60 wrote:

Hi all dear,

I want to calculating heterozygosity for each SNPs. After studying the plink guide, I have calculated the heterozygosity using the following script.

plink  --make-bed --file purebred411_qc --freqx --out freqx_411

And I got this output:

CHR SNP         A1  A2  C(HOM A1)   C(HET)  C(HOM A2)   C(HAP A1)   C(HAP A2)   C(MISSING)
1   AX-85111653 3   1   45           187     178         0          0            1
1   AX-85043398 2   4   45           186     180         0          0            0
1   AX-85051079 4   2   5            71      335         0          0            0
1   AX-85154093 4   2   5            72      332         0          0            2
1   AX-85063459 3   1   56           199     155         0          0            1

So, First, I want to know if I correctly calculated the heterozygosity value for each SNPs?

Second, if done correctly, how can I calculate the percentage of heterozygosity of each SNPs?

Best Regard

Mostafa

snp • 299 views
ADD COMMENTlink modified 3 months ago by Kevin Blighe41k • written 3 months ago by mostafarafiepour60
1
gravatar for Kevin Blighe
3 months ago by
Kevin Blighe41k
Kevin Blighe41k wrote:

Hello Mostafa,

Here is what the plink manual states:

Allele frequency

  • --freq < counts | case-control > < gz >
  • --freqx <gz> (alias: --frqx)

By itself, --freq writes a minor allele frequency report to plink.frq. If you add the 'counts' modifier, an allele count report is written to plink.frq.count instead. Alternatively, you can use --freq with --within/--family to write a cluster-stratified frequency report to plink.frq.strat, or use the 'case-control' modifier to write a case/control phenotype-stratified report to plink.frq.cc.

--freqx writes a more informative genotype count report to plink.frqx.

For both flags, gzipped output can be requested with the 'gz' modifier.

Nonfounders are normally excluded from these counts/frequencies; use --nonfounders to change this.

All of these reports (except for --freq + --within/--family) are valid input for --read-freq; --freqx is the most powerful when used in that capacity, since it preserves deviation from Hardy-Weinberg equilibrium.

[source: https://www.cog-genomics.org/plink/1.9/basic_stats#freq]

----------------------------------------------------

You used --freqx. Here is a description of the output:

.frqx (genotype count report)

Produced by --freqx. Valid input for --read-freq.

A text file with a header line, and then one line per variant with the following ten fields:

  • CHR Chromosome code
  • SNP Variant identifier
  • A1 Allele 1 (usually minor)
  • A2 Allele 2 (usually major)
  • C(HOM A1) A1 homozygote count
  • C(HET) Heterozygote count
  • C(HOM A2) A2 homozygote count
  • C(HAP A1) Haploid A1 count (includes male X chromosome)
  • C(HAP A2) Haploid A2 count
  • C(MISSING) Missing genotype count

[source: https://www.cog-genomics.org/plink/1.9/formats#frqx]

----------------------------------------------------

Final piece of information: it looks like your bases are encoded in 1,2,3,4 format (A,C,G,T == 1,2,3,4).

So, now you should understand your output and, I believe, you will know whether or not you have chosen the correct program / command.

Kevin

ADD COMMENTlink written 3 months ago by Kevin Blighe41k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 798 users visited in the last hour