How accurate is the IBD calculation by plink?
2
2
Entering edit mode
5.8 years ago
MAPK ★ 1.9k

I was trying to calculate the IBD values for about 100 individuals all likely to be unrelated. I tried to use plink tool ( http://pngu.mgh.harvard.edu/~purcell/plink/ibdibs.shtml ), but looks like it generates to many false positives (or high IBDs for unrelated individuals). I have one sample with at least 5 other samples with IBD =1 (I am looking at Z0 values). Can someone please explain me what these values mentioned in their website are:

Z0  P(IBD=0)
Z1  P(IBD=1)
Z2  P(IBD=2)
PI_HAT  Proportion IBD, i.e. P(IBD=2) + 0.5*P(IBD=1)

0
Entering edit mode

It's possible that your unrelated individuals are actually related, or sample swaps?

0
Entering edit mode

It's also known that PLINK's IBS calculations aren't that great. The kcoeff paper has some comparisons.

0
Entering edit mode

Have you carefully QC'ed your genotypes like what you would do for GWAS analysis? Poor quality genotypes would give you wrong calculations, but it's not the fault of IBD.

6
Entering edit mode
5.8 years ago

These are not false positives!

In fact, they are not positives at all. As you yourself wrote, Z0 is the probability that at a given locus 0 alleles are identical by descent. In other words, if your samples are unrelated, you should expect a Z0 close to 1.

PI_HAT is a measure of overall IBD alleles. If your samples are unrelated, you should expect a PI_HAT close to 0.

Z0, Z1, and Z2 segregate out the probabilities of having IBD of 0, 1, or 2 over the loci, which gives you a way of discriminating between relationship types. Ideal parent-offspring has (Z0, Z1, Z2) = (0, 1, 0), i.e. all loci have one allele identical by descent; ideal full sibling = (1/4, 1/2, 1/4), i.e. 25% of loci have 0 alleles IBD, 50% have 1 allele IBD, 25% have 2 alleles IBD; etc.

0
Entering edit mode

Thanks. So do I need to compare PI_HAT to get the actual relationships between the individual which is supposedly between 0 to 1?

0
Entering edit mode

Yes, PI_HAT is a summary statistic that will give you overall IBD proportion. But Z0, Z1, and Z2 are also helpful to understand for distinguishing between relationship types, so it's useful to take the time to understand what all four measures mean.

0
Entering edit mode

Thanks, but Pi_HAT values don't make sense at all (unless I am doing something wrong). I am getting 0 for the same individuals, where it is supposed to be 1 (IBD=1 , when compared to same or monozygotic individuals?)

1
Entering edit mode

You may be confusing Z0, Z1, Z2, and PI_HAT. First take some time to understand their relationship.

0
Entering edit mode

Where is good place to start to understand this relationship? It seems like plink documentation would rather give 5 hints than 1 explanation.

0
Entering edit mode

Also, I am using only 27000 SNPs (LD pruned and quality filtered) for 150 samples. Do you think the number of SNPs is the issue here?

0
Entering edit mode

Supposedly, that should be enough.

1
Entering edit mode
5.8 years ago

If you want an independent method to compare to I suggest trying kcoeff which estimates k0, k1, and k2 which are the portion of the genome shared IBS0/1/2.

0
Entering edit mode

Thanks. I am getting IBD = 1 for 1 sample with multiple samples. So this can't be true unless the samples are duplicated.

1
Entering edit mode

You should be able to tell if the samples are exactly duplicated by looking at the data. Otherwise, they might have been duplicated during sample handling before the genotyping.