Heterozygosity by individual
3
0
Entering edit mode
8.3 years ago
Li • 0

How to compute heterozygosity for each individual?

Currently I use:

plink --bfile file --hardy --out file --noweb

but I need .hwe files for each individual separately in a dataset of hundreds of individuals.

plink hardy individual • 9.9k views
ADD COMMENT
1
Entering edit mode
8.3 years ago

--hardy requires many individuals to judge deviation from Hardy-Weinberg equilibrium. Single-individual .hwe files will not be useful.

If you are trying to perform quality control on samples, consider plotting top principal components (computed with EIGENSOFT 6, or plink --pca) and removing extreme outliers.

ADD COMMENT
0
Entering edit mode

I need to compute heterozygosity per individual from genotype data. Not for quality control.

ADD REPLY
0
Entering edit mode

Ah. --het (https://www.cog-genomics.org/plink2/basic_stats#ibc ) should come in handy, then.

ADD REPLY
0
Entering edit mode

Thanks. Two problems:

  1. --het computes observed and expected autosomal homozygous genotype counts for each sample. However, I need heterozygosity for the X chromosome too (separately). --chr 23 --het gives zero o(hom) and e(hom). My sample is females only.

  2. --het gives an observed heterozygosity of 40% for my example individual, while if I calculate the average o(het) of autosomal snips of the same individual from a .hwe file, o(het) is 30%. For expected heterozygosity the numbers are 36% vs 16%. Thus, the two methods yield very different answers, although if I understand correctly, they should measure the same thing.

As a context: the goal is to compare heterozygosity of X and autosomes between groups, and in order to do a t-test, I need X and autosomal het per individual. I already have average X and autosomal het per group.

ADD REPLY
0
Entering edit mode

It's a dirty hack, but if your .bim file has numeric chromosome codes, you can force the X chromosome to be treated like an autosome by specifying a species with more chromosomes (e.g. "--dog").

ADD REPLY
0
Entering edit mode

That works, but I still need to compute this using --hardy too, per individual. Creating a text file with one person, and using --keep myfile.txt is not possible because of the high number of individuals.

ADD REPLY
0
Entering edit mode
5.0 years ago
Johan Zicola ▴ 70

VCFtools has the function --het such as vcftools --vcf input.vcf --het --out output.het

--het

Calculates a measure of heterozygosity on a per-individual basis. Specfically, the inbreeding coefficient, F, is estimated for each individual using a method of moments. The resulting file has the suffix ".het".

Check the full manual: https://vcftools.github.io/man_latest.html

Check this post if you have difficulty to interpret the results: Is the heterozygosity flag (--het) in vcftools calculate observed and expected heterozygosity?

ADD COMMENT

Login before adding your answer.

Traffic: 1148 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6