Window and step size for LD-independent SNPs
Entering edit mode
6.7 years ago
willgilks ▴ 360


I'm trying to determine which of my SNPs from WG-NGS are independent of LD, i.e. tagging SNPs. The downstream use for this is when testing genotypes against a trait, the number of independent tests is known, generally allowing significance thresholds to be set.

I've tried different window sizes and step sizes, and they give quite a range of results in terms of number of independent SNPs and the population structure (not surprisingly).

How does one select the correct window and step sizes?

I guess larger window sizes are prefered if one has the RAM, and that using step sizes which have more SNPs than the window size will result in ignoring much of the genome. Large window sizes with small step sizes would maybe give a more random selection of SNPs ..

At an intermediate step size of 100 SNP, the number of independent SNPs, drops from 246k to 26k between window size of 10kb and 50Kb...this is a ten-fold difference which would be important when considering independent tests.

The sample population are 220 Drosophila melanogaster, the points on the scatter-plots are coloured by order sequenced. On average, the pair-wise r2 levels-out after 250bp, and there is one SNP every 150bp. The separation of the population into four groups is supported by Fst analysis showing two distinct regions causing this.

These are the plots and code

Any nice comments, especially useful ones are most welcome. Also questions welcome.




r2 ld plink SNP snp • 2.9k views

Login before adding your answer.

Traffic: 2300 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6