What is a suitable LD threshold for LD pruning of snp data?
Entering edit mode
12 months ago
Colari19 ▴ 60


I have SNP data for approximately 600,000 SNPs that I'll be using for eQTL analysis.

I've been advised to derive a pruned set of SNPs that are in approximate linkage disequilibrium (LD).

I've used the SNPRelate::snpgdsLDpruning function in R to do this, using an LD threshold of 0.2.

Is this an appropriate threshold to use?

This takes our 600,000 snps down to ~60,000; using a threshold of 0.1 leaves ~20,000 snps.

This is quite a big difference so I'm wondering if 0.1-0.2 is too stringent.

Is there a more-or-less standard threshold that is used for LD pruning?

Thank you.

SNP eqtl LD • 1.4k views
Entering edit mode
12 months ago
reza.jabal ▴ 470

The choice of optimal r^2 threshold for LD pruning is highly dependent on the population history of your study subjects. Despite D' that is purely a measure of the non-random association of alleles at two or more loci, r^2 values are informed by allele frequencies as well. For example for two polymorphic loci (A & B), one with 50% allele frequency and the other with 1% frequency which are in complete LD, the D' value would be 1, but r^2 would only be 0.01. This tells us although these two polymorphic loci are in complete LD with each other, the allele B is so rare that 99% of the time it is not observed on the same haplotype with allele A.

To circumvent such situations, a minor allele frequency filtering is carried out prior pairwise LD calculation to remove rare alleles. Sometimes, if your study population is not a true representative of the extant population, this filtering step is not sufficient since rare alleles can rapidly drift up to higher frequencies in structured or founder populations. Although I assume these scenarios are an overstretch to your question, you'd like to use more stringent r^2 values to avoid colinearity of effects among your pruned SNPs.

On the contrary, if you're dealing with a population with massive haplotype diversity (such as sub-Saharan Africans) and you would assume that your study population is not large enough to be a true representative of all haplotypes in the population you may want to use a more relaxed r^2 threshold for pruning.

Overall, as long as you can rationalise your choice of r^2 either thresholds are acceptable, but r^2 < 0.2 is the common practice for European populations.

Entering edit mode

Thank you for your detailed and thoughtful reply.


Login before adding your answer.

Traffic: 1842 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6