Entering edit mode

5 days ago

curious
▴
500

Running command `plink --het`

gives a column "F".

I read that F is essentially "1 - (HI/HS), where HI represents the individual's heterozygosity, and HS the subpopulation's heterozygosity".

From this definition it would seem that the lower the F value for a sample the higher the heterozygosity (eg maybe contamination if low enough, inbreeding if high enough). Is it right?

I am also wondering what a "normal" range of F is for a randomly sampled population. Here they say to remove samples that are 3 standard deviation (SD) units from the mean, but what is a typical mean? 0.018?