Window-size in PLINK's indep-pairwise LD pruning
1
0
Entering edit mode
6.3 years ago
solion • 0

I am pruning datasets of varying SNP density using PLINK --indep-pairwise, comparing different r2 cut-offs. The density ranges from the 1000 Genomes phase 3 data (e.g. >6 million SNPs on chr 2) to that of SNP-array data (20.000 SNPs on chr 2). While doing so, I want to keep the other parameters (window-size and frame-shift) constant.

My current parameter choices are: --indep-pairwise 10000 1000 [r2-cut-off, which varies in a range from 0.5-0.95]

Is there a downside to choosing a large window-size like 10000 on less dense data or a high r2-cut-off other than run-time?

genome SNP plink pruning • 14k views
ADD COMMENT
0
Entering edit mode
6.3 years ago

I would first filter for common variants between both sample groups and then do the pruning. Otherwise, my feeling is that your results would be biased due to the fact that the genotype densities are different. This is possible in PLINK by first outputting the variant IDs for one dataset as a list, and the using this list to filter the other (and vice-versa).

Your large choice for window size is not necessary. The window size relates to # of genotypes / SNPs. Typically, a window size of just 50 (i.e., 50 SNPs) is chosen. You would probably crash your system by choosing 10 000 (?). By choosing 10 000, LD will be calculated on a pairwise basis between all 10 000 SNPs, resulting in 100 000 000 comparisons, which will be repeated many 1 000s of times as the algorithm moves across each chromosome's SNPs.

Take a look at my tutorial, where I actually merge a sample dataset to the 1000 Genomes Phase III: Produce PCA bi-plot for 1000 Genomes Phase III in VCF format

Kevin

ADD COMMENT

Login before adding your answer.

Traffic: 2824 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6