According to plink documentation, a hardcall is saved when the distance from the nearest hardcall, defined as **0.5 * sum_i |x_i - round(x_i)|** (where the x_i's are 0..2 allele dosages), is not greater than 0.1.

Truthfully, this doesn't make sense to me (especially why x_i is being summed) and was hoping if someone could help break down math.

Many thanks!

So lets say you had single dosages representing only 2 alleles.

Example 1:A/T = 1.900

0.5* (|1.900-2| + |0.100 - 0|)

= 0.5 * (0.100 + 0.100)

= 0.5 * 0.200

= 0.100

As this is not greater than 0.1, we would code the dosage as 2 (for TT).

Example 2:A/T = 0.980

0.5 * (|0.980 -1| + |1.02 - 1|)

=0.5 * (0.020 + 0.020)

=0.5 * 0.040

= 0.020

In this example, we would code it as A and T (heterogeneous).

My concern is how do you "decide" which alleles are present. Looking at the examples and my first example, it seems obvious. But I want to be able to program logic in a python script to check my understanding and compare my results against Plink.

I appreciate your help!

Just look at when round(x_i) isn't zero.

Clever. =)

Thank you