How to interpret heterozygosity rate ?
1
0
Entering edit mode
7.3 years ago
NB ▴ 960

Hello,

I have a replication data of 12 markers and unrelated ~2300 cases + ~1200 controls. I want to check the heterozygosity rate to see if I should exclude any individuals out. I used plink

plink --bfile QC_file --het --out QC_het


and then calculated the rate using (N(NM) - O(Hom))/N(NM).

With this the heterozygosity range is from 0 to 0.23.

Heterozygosity rate   0      0.05   0.1   0.2
No of individuals     2924   553    33    1


The inbreeding coefficent rate is mostly 1 and for some its negative value.

I am not sure how to interpret this or filter individuals based on this range ? Any help is appreciated.

Many thanks

2
Entering edit mode
7.3 years ago
dora ▴ 90

This may suggest that: Only one individual with het 0.2 shows relatively high autosomal heterozygosity deviation.

The inbreeding coefficent rate is mostly 1. Do you mean the values in the last column of the output het file are almost 1?

0
Entering edit mode

Yes, the final column, "F" is mostly 1 and some are negative.

0
Entering edit mode

Sorry, then my answer would be the other way around, meaning that almost all of the samples are heterogeneous (This population is substantial heterogeneous and may include a number of ethnic groups). O/w the samples are contaminated.

0
Entering edit mode

The samples are from 3 different countries, for the analysis the samples are corrected for geographic origin but I am just not sure if its sensible to remove individuals or to just keep them considering the dataset is really small.

0
Entering edit mode

What does the negative value of F indicate?