How Does Ld In Plink Work?
4
5
Entering edit mode
8.1 years ago
Javier2013 ▴ 130

Hi everyone:

I am working with genome wide data (SNPs) using plink and want to plot and LD decay graph (something like this http://postimg.org/image/5bkdju6jx/)

I know that several authors use the --r2 command implemented in plink but I do not understand how to make a plot like this with the plink output.

Here is the output im getting: http://postimg.org/image/87hq8nodp/

I was doing some testing with the LWK population (Hapmap 3) using the default values of the --r2 command.

Can someone explain me how to interpret these results?

In particular I don't know what this command means: --ld-window, not even after reading the description in the plink website.

Thank you in advance.

plink ld snps • 17k views
0
Entering edit mode

fyi, it's not a big deal with small windows, but if you're doing any heavy-duty --r2 runs, PLINK 1.9 is much faster than PLINK 1.07.

0
Entering edit mode

hello, i need help in creating map and ped files in finding linkage disequilibrium for the data from dbsnp, am confused because in dbsnp some snps have 2-3 allele changes for same physical position and few snps have many physical positions,so am confused in creating those files. please help me

15
Entering edit mode
8.1 years ago
Clare ▴ 170

So there are various plink options

--ld-window-kb 1000 # This means only assess LD between SNPs within a specified window i.e. here within 1000 kb. With whole genome data, there are so many SNPs, you don't want to compare every snp to every other snp - the outfile would be too big. So this specifies to only compare ones within a certain distance. You should set this based on your knowledge of your species. i.e. if LD decays very quickly then this can be smaller. If LD is long in your taxon, this should be longer. In the example you provide this is what they mean by pairs of SNPs less than 70kb apart, i.e. they set --ld-window-kb 70

 --ld-window-r2 0.2  # This means only output the r2 values greater than 0.2. (default is 0.2) So for an LD decay plot you may want to change that to 0 otherwise your averages may be off - see below (but this will make the file enormous, so you might want to do everything separately for each chromosome).


To get the the LD decay plot you need to do something like the following. First of all - generate a text file with two columns - the first is the distance between the two SNPs (i.e. BP_B - BP_A, so if the first snp is at position 1000 on the chromosome, and the second is at position 2150, the distance is 1150 ) and the second is the corresponding R2 value (yoru last column). So you get a file with R2 values for SNPs certain distances apart. distance R2 5 0.2 5 0.3 67 0.2 67 0.4 67 0.5

Then for each distance apart, calculate an average R2 (you need to generate a script to do this e.g. in python/perl) i.e. distance averageR2 5 0.25 67 0.3667

Then plot that file to get the LD decay.

Depending on how many points you have, you may want to using a sliding window average script for the plot. In the example, I think this is what they did - in terms of 1kb intervals, i.e. they did a sliding window - with a window size of 1kb and step size of 1kb, and calculated average r2.

0
Entering edit mode

slightly updated from first post.....

0
Entering edit mode

Hi Clare! Thank you so much for your reply. I think I understand it now. So, just to get things clear, they did 70 r2 averages, for each sliding windows of 1kb size between all pairs of snps?

5
Entering edit mode
8.0 years ago
Javier2013 ▴ 90

Hi:

If anyone is interested I finally got a logical result. My problem was that my genetic map was not accurate; thus my LD was completely wrong.

The parameters I used in plink were these:

--r2 --ld-window-r2 0 --r2 --ld-window 99999 --ld-window-kb 70

Then I just this script to plot the result (in R):

LD$distancekb <- with (LD, LD$BP_B-LD$BP_A)/1000 ## the distance between snp1 and snp2 in kb LD$grp <- cut(LD$distancekb, 0:70) ## bin 70kb r2means <- with (LD, tapply(LD$R2, LD\$grp, FUN = mean)) ##r2 mean every 1kb

Javier

0
Entering edit mode

So --ld-window-kb specifies the window in which to compare snps. What is --ld-window 99999?

1
Entering edit mode
8.1 years ago
Javier2013 ▴ 130

Hi, I just finished with the script but the values that I get are kind of odd. For instance, this is an african hapmap3 sample and by 70kb I still get an average r2 of 0.26 which is a lot, right? Could this be because it is a prunned data set?

Here is a picture of the script plus the results... http://s21.postimg.org/xag70922v/Screen_Shot_2013_10_26_at_2_30_18_AM.png

Thanks again!

0
Entering edit mode

I tried using a non-prunned data set but I get the same results. I guess then its something with the parameters I am using. I would really appreciated it if someone could help me out with this.

Here is my parameter file: Options in effect: --noweb --bfile lwk --r2 --ld-window-kb 70 --ld-window-r2 0 --ld-window 20 --out lwk_70kb_20spns

Thanks!

0
Entering edit mode

Sorry for my absence - I haven't been checking my email. I wonder if its the --ld-window 20 command. With this you are saying that you only want to analyse SNPS that are not more 20 SNPs apart. I'm not sure why you would want to do that for your task you are trying to do. Try removing that option but keeping all the others. Hopefully that will fix it.

0
Entering edit mode

Hi Clare,

I just tried rerunning the test without that option but the results are pretty much the same! I do not understand why. I makes no sense given the fact that it is an African population. I do not know what else to do.

0
Entering edit mode
7.9 years ago

Hi Javier,

After get r2means then, which function do you use to make the plot as mention above?

Thank you!!

0
Entering edit mode

I only used R to plot the result with the script mentioned above.