What is a suitable LD threshold for LD pruning of snp data?
1
2
Entering edit mode
3.7 years ago
Colari19 ▴ 90

Hi,

I have SNP data for approximately 600,000 SNPs that I'll be using for eQTL analysis.

I've been advised to derive a pruned set of SNPs that are in approximate linkage disequilibrium (LD).

I've used the SNPRelate::snpgdsLDpruning function in R to do this, using an LD threshold of 0.2.

Is this an appropriate threshold to use?

This takes our 600,000 snps down to ~60,000; using a threshold of 0.1 leaves ~20,000 snps.

This is quite a big difference so I'm wondering if 0.1-0.2 is too stringent.

Is there a more-or-less standard threshold that is used for LD pruning?

Thank you.

SNP eqtl LD • 6.9k views
ADD COMMENT
14
Entering edit mode
3.7 years ago
reza.jabal ▴ 580

The choice of optimal r^2 threshold for LD pruning is highly dependent on the population history of your study subjects. Despite D' that is purely a measure of the non-random association of alleles at two or more loci, r^2 values are informed by allele frequencies as well. For example for two polymorphic loci (A & B), one with 50% allele frequency and the other with 1% frequency which are in complete LD, the D' value would be 1, but r^2 would only be 0.01. This tells us although these two polymorphic loci are in complete LD with each other, the allele B is so rare that 99% of the time it is not observed on the same haplotype with allele A.

To circumvent such situations, a minor allele frequency filtering is carried out prior pairwise LD calculation to remove rare alleles. Sometimes, if your study population is not a true representative of the extant population, this filtering step is not sufficient since rare alleles can rapidly drift up to higher frequencies in structured or founder populations. Although I assume these scenarios are an overstretch to your question, you'd like to use more stringent r^2 values to avoid colinearity of effects among your pruned SNPs.

On the contrary, if you're dealing with a population with massive haplotype diversity (such as sub-Saharan Africans) and you would assume that your study population is not large enough to be a true representative of all haplotypes in the population you may want to use a more relaxed r^2 threshold for pruning.

Overall, as long as you can rationalise your choice of r^2 either thresholds are acceptable, but r^2 < 0.2 is the common practice for European populations.

ADD COMMENT
0
Entering edit mode

Thank you for your detailed and thoughtful reply.

ADD REPLY

Login before adding your answer.

Traffic: 2906 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6