6.3 years ago by
So there are various plink options
--ld-window-kb 1000 # This means only assess LD between SNPs within a specified window i.e. here within 1000 kb. With whole genome data, there are so many SNPs, you don't want to compare every snp to every other snp - the outfile would be too big. So this specifies to only compare ones within a certain distance. You should set this based on your knowledge of your species. i.e. if LD decays very quickly then this can be smaller. If LD is long in your taxon, this should be longer. In the example you provide this is what they mean by pairs of SNPs less than 70kb apart, i.e. they set --ld-window-kb 70
--ld-window-r2 0.2 # This means only output the r2 values greater than 0.2. (default is 0.2) So for an LD decay plot you may want to change that to 0 otherwise your averages may be off - see below (but this will make the file enormous, so you might want to do everything separately for each chromosome).
To get the the LD decay plot you need to do something like the following.
First of all - generate a text file with two columns - the first is the distance between the two SNPs (i.e. BP_B - BP_A, so if the first snp is at position 1000 on the chromosome, and the second is at position 2150, the distance is 1150 ) and the second is the corresponding R2 value (yoru last column).
So you get a file with R2 values for SNPs certain distances apart.
Then for each distance apart, calculate an average R2 (you need to generate a script to do this e.g. in python/perl)
Then plot that file to get the LD decay.
Depending on how many points you have, you may want to using a sliding window average script for the plot.
In the example, I think this is what they did - in terms of 1kb intervals, i.e. they did a sliding window - with a window size of 1kb and step size of 1kb, and calculated average r2.
modified 6.3 years ago
6.3 years ago by
Clare • 130