How accurate is the IBD calculation by plink?
2
2
Entering edit mode
5.8 years ago
MAPK ★ 1.9k

I was trying to calculate the IBD values for about 100 individuals all likely to be unrelated. I tried to use plink tool ( http://pngu.mgh.harvard.edu/~purcell/plink/ibdibs.shtml ), but looks like it generates to many false positives (or high IBDs for unrelated individuals). I have one sample with at least 5 other samples with IBD =1 (I am looking at Z0 values). Can someone please explain me what these values mentioned in their website are:

Z0  P(IBD=0)
Z1  P(IBD=1)
Z2  P(IBD=2)
PI_HAT  Proportion IBD, i.e. P(IBD=2) + 0.5*P(IBD=1)
plink IBD • 9.4k views
ADD COMMENT
0
Entering edit mode

It's possible that your unrelated individuals are actually related, or sample swaps?

ADD REPLY
0
Entering edit mode

It's also known that PLINK's IBS calculations aren't that great. The kcoeff paper has some comparisons.

ADD REPLY
0
Entering edit mode

Have you carefully QC'ed your genotypes like what you would do for GWAS analysis? Poor quality genotypes would give you wrong calculations, but it's not the fault of IBD.

ADD REPLY
6
Entering edit mode
5.8 years ago

These are not false positives!

In fact, they are not positives at all. As you yourself wrote, Z0 is the probability that at a given locus 0 alleles are identical by descent. In other words, if your samples are unrelated, you should expect a Z0 close to 1.

PI_HAT is a measure of overall IBD alleles. If your samples are unrelated, you should expect a PI_HAT close to 0.

Z0, Z1, and Z2 segregate out the probabilities of having IBD of 0, 1, or 2 over the loci, which gives you a way of discriminating between relationship types. Ideal parent-offspring has (Z0, Z1, Z2) = (0, 1, 0), i.e. all loci have one allele identical by descent; ideal full sibling = (1/4, 1/2, 1/4), i.e. 25% of loci have 0 alleles IBD, 50% have 1 allele IBD, 25% have 2 alleles IBD; etc.

ADD COMMENT
0
Entering edit mode

Thanks. So do I need to compare PI_HAT to get the actual relationships between the individual which is supposedly between 0 to 1?

ADD REPLY
0
Entering edit mode

Yes, PI_HAT is a summary statistic that will give you overall IBD proportion. But Z0, Z1, and Z2 are also helpful to understand for distinguishing between relationship types, so it's useful to take the time to understand what all four measures mean.

ADD REPLY
0
Entering edit mode

Thanks, but Pi_HAT values don't make sense at all (unless I am doing something wrong). I am getting 0 for the same individuals, where it is supposed to be 1 (IBD=1 , when compared to same or monozygotic individuals?)

ADD REPLY
1
Entering edit mode

You may be confusing Z0, Z1, Z2, and PI_HAT. First take some time to understand their relationship.

ADD REPLY
0
Entering edit mode

Where is good place to start to understand this relationship? It seems like plink documentation would rather give 5 hints than 1 explanation.

ADD REPLY
0
Entering edit mode

Also, I am using only 27000 SNPs (LD pruned and quality filtered) for 150 samples. Do you think the number of SNPs is the issue here?

ADD REPLY
0
Entering edit mode

Supposedly, that should be enough.

ADD REPLY
1
Entering edit mode
5.8 years ago

If you want an independent method to compare to I suggest trying kcoeff which estimates k0, k1, and k2 which are the portion of the genome shared IBS0/1/2.

ADD COMMENT
0
Entering edit mode

Thanks. I am getting IBD = 1 for 1 sample with multiple samples. So this can't be true unless the samples are duplicated.

ADD REPLY
1
Entering edit mode

You should be able to tell if the samples are exactly duplicated by looking at the data. Otherwise, they might have been duplicated during sample handling before the genotyping.

ADD REPLY

Login before adding your answer.

Traffic: 2226 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6