How accurate is the IBD calculation by plink?
5.8 years ago
MAPK ★ 1.9k

I was trying to calculate the IBD values for about 100 individuals all likely to be unrelated. I tried to use plink tool ( http://pngu.mgh.harvard.edu/~purcell/plink/ibdibs.shtml ), but looks like it generates to many false positives (or high IBDs for unrelated individuals). I have one sample with at least 5 other samples with IBD =1 (I am looking at Z0 values). Can someone please explain me what these values mentioned in their website are:

Z0  P(IBD=0)
Z1  P(IBD=1)
Z2  P(IBD=2)
PI_HAT  Proportion IBD, i.e. P(IBD=2) + 0.5*P(IBD=1)

It's possible that your unrelated individuals are actually related, or sample swaps?

It's also known that PLINK's IBS calculations aren't that great. The kcoeff paper has some comparisons.

Have you carefully QC'ed your genotypes like what you would do for GWAS analysis? Poor quality genotypes would give you wrong calculations, but it's not the fault of IBD.

6
5.8 years ago

These are not false positives!

In fact, they are not positives at all. As you yourself wrote, Z0 is the probability that at a given locus 0 alleles are identical by descent. In other words, if your samples are unrelated, you should expect a Z0 close to 1.

PI_HAT is a measure of overall IBD alleles. If your samples are unrelated, you should expect a PI_HAT close to 0.

Z0, Z1, and Z2 segregate out the probabilities of having IBD of 0, 1, or 2 over the loci, which gives you a way of discriminating between relationship types. Ideal parent-offspring has (Z0, Z1, Z2) = (0, 1, 0), i.e. all loci have one allele identical by descent; ideal full sibling = (1/4, 1/2, 1/4), i.e. 25% of loci have 0 alleles IBD, 50% have 1 allele IBD, 25% have 2 alleles IBD; etc.

Thanks. So do I need to compare PI_HAT to get the actual relationships between the individual which is supposedly between 0 to 1?

Yes, PI_HAT is a summary statistic that will give you overall IBD proportion. But Z0, Z1, and Z2 are also helpful to understand for distinguishing between relationship types, so it's useful to take the time to understand what all four measures mean.

Thanks, but Pi_HAT values don't make sense at all (unless I am doing something wrong). I am getting 0 for the same individuals, where it is supposed to be 1 (IBD=1 , when compared to same or monozygotic individuals?)

You may be confusing Z0, Z1, Z2, and PI_HAT. First take some time to understand their relationship.

Where is good place to start to understand this relationship? It seems like plink documentation would rather give 5 hints than 1 explanation.

Also, I am using only 27000 SNPs (LD pruned and quality filtered) for 150 samples. Do you think the number of SNPs is the issue here?

Supposedly, that should be enough.

5.8 years ago

If you want an independent method to compare to I suggest trying kcoeff which estimates k0, k1, and k2 which are the portion of the genome shared IBS0/1/2.

Thanks. I am getting IBD = 1 for 1 sample with multiple samples. So this can't be true unless the samples are duplicated.

You should be able to tell if the samples are exactly duplicated by looking at the data. Otherwise, they might have been duplicated during sample handling before the genotyping.