Question: How Does Ld In Plink Work?
5
gravatar for Javier2013
6.3 years ago by
Javier2013120
Javier2013120 wrote:

Hi everyone:

I am working with genome wide data (SNPs) using plink and want to plot and LD decay graph (something like this http://postimg.org/image/5bkdju6jx/)

I know that several authors use the --r2 command implemented in plink but I do not understand how to make a plot like this with the plink output.

Here is the output im getting: http://postimg.org/image/87hq8nodp/

I was doing some testing with the LWK population (Hapmap 3) using the default values of the --r2 command.

Can someone explain me how to interpret these results?

In particular I don't know what this command means: --ld-window, not even after reading the description in the plink website.

Thank you in advance.

snps plink ld • 13k views
ADD COMMENTlink modified 6.1 years ago by msutada0 • written 6.3 years ago by Javier2013120

fyi, it's not a big deal with small windows, but if you're doing any heavy-duty --r2 runs, PLINK 1.9 is much faster than PLINK 1.07.

ADD REPLYlink written 6.2 years ago by chrchang5236.5k

hello, i need help in creating map and ped files in finding linkage disequilibrium for the data from dbsnp, am confused because in dbsnp some snps have 2-3 allele changes for same physical position and few snps have many physical positions,so am confused in creating those files. please help me

ADD REPLYlink written 23 months ago by 14311a23010
11
gravatar for Clare
6.3 years ago by
Clare130
United States
Clare130 wrote:

So there are various plink options

--ld-window-kb 1000 # This means only assess LD between SNPs within a specified window i.e. here within 1000 kb. With whole genome data, there are so many SNPs, you don't want to compare every snp to every other snp - the outfile would be too big. So this specifies to only compare ones within a certain distance. You should set this based on your knowledge of your species. i.e. if LD decays very quickly then this can be smaller. If LD is long in your taxon, this should be longer. In the example you provide this is what they mean by pairs of SNPs less than 70kb apart, i.e. they set --ld-window-kb 70

 --ld-window-r2 0.2  # This means only output the r2 values greater than 0.2. (default is 0.2) So for an LD decay plot you may want to change that to 0 otherwise your averages may be off - see below (but this will make the file enormous, so you might want to do everything separately for each chromosome).

To get the the LD decay plot you need to do something like the following. First of all - generate a text file with two columns - the first is the distance between the two SNPs (i.e. BP_B - BP_A, so if the first snp is at position 1000 on the chromosome, and the second is at position 2150, the distance is 1150 ) and the second is the corresponding R2 value (yoru last column). So you get a file with R2 values for SNPs certain distances apart. distance R2 5 0.2 5 0.3 67 0.2 67 0.4 67 0.5

Then for each distance apart, calculate an average R2 (you need to generate a script to do this e.g. in python/perl) i.e. distance averageR2 5 0.25 67 0.3667

Then plot that file to get the LD decay.

Depending on how many points you have, you may want to using a sliding window average script for the plot. In the example, I think this is what they did - in terms of 1kb intervals, i.e. they did a sliding window - with a window size of 1kb and step size of 1kb, and calculated average r2.

ADD COMMENTlink modified 6.3 years ago • written 6.3 years ago by Clare130

slightly updated from first post.....

ADD REPLYlink written 6.3 years ago by Clare130

Hi Clare! Thank you so much for your reply. I think I understand it now. So, just to get things clear, they did 70 r2 averages, for each sliding windows of 1kb size between all pairs of snps?

ADD REPLYlink written 6.3 years ago by Javier2013120
5
gravatar for Javier2013
6.2 years ago by
Javier201390
UPF
Javier201390 wrote:

Hi:

If anyone is interested I finally got a logical result. My problem was that my genetic map was not accurate; thus my LD was completely wrong.

The parameters I used in plink were these:

--r2 --ld-window-r2 0 --r2 --ld-window 99999 --ld-window-kb 70

Then I just this script to plot the result (in R):

LD <- read.table("your_plinkresult.ld", header = T) ##read your .ld data

LD$distancekb <- with (LD, LD$BP_B-LD$BP_A)/1000 ## the distance between snp1 and snp2 in kb

LD$grp <- cut(LD$distancekb, 0:70) ## bin 70kb

r2means <- with (LD, tapply(LD$R2, LD$grp, FUN = mean)) ##r2 mean every 1kb

Javier

ADD COMMENTlink modified 6.2 years ago • written 6.2 years ago by Javier201390

So --ld-window-kb specifies the window in which to compare snps. What is --ld-window 99999?

ADD REPLYlink written 2.0 years ago by rmf900
1
gravatar for Javier2013
6.3 years ago by
Javier2013120
Javier2013120 wrote:

Hi, I just finished with the script but the values that I get are kind of odd. For instance, this is an african hapmap3 sample and by 70kb I still get an average r2 of 0.26 which is a lot, right? Could this be because it is a prunned data set?

Here is a picture of the script plus the results... http://s21.postimg.org/xag70922v/Screen_Shot_2013_10_26_at_2_30_18_AM.png

Thanks again!

ADD COMMENTlink modified 6.3 years ago • written 6.3 years ago by Javier2013120

I tried using a non-prunned data set but I get the same results. I guess then its something with the parameters I am using. I would really appreciated it if someone could help me out with this.

Here is my parameter file: Options in effect: --noweb --bfile lwk --r2 --ld-window-kb 70 --ld-window-r2 0 --ld-window 20 --out lwk_70kb_20spns

Thanks!

ADD REPLYlink written 6.3 years ago by Javier2013120

Sorry for my absence - I haven't been checking my email. I wonder if its the --ld-window 20 command. With this you are saying that you only want to analyse SNPS that are not more 20 SNPs apart. I'm not sure why you would want to do that for your task you are trying to do. Try removing that option but keeping all the others. Hopefully that will fix it.

ADD REPLYlink written 6.3 years ago by Clare130

Hi Clare,

I just tried rerunning the test without that option but the results are pretty much the same! I do not understand why. I makes no sense given the fact that it is an African population. I do not know what else to do.

ADD REPLYlink written 6.2 years ago by Javier201390
0
gravatar for msutada
6.1 years ago by
msutada0
msutada0 wrote:

Hi Javier,

After get r2means then, which function do you use to make the plot as mention above?

Thank you!!

ADD COMMENTlink written 6.1 years ago by msutada0

I only used R to plot the result with the script mentioned above.

ADD REPLYlink written 6.0 years ago by GabrielMontenegro540
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2192 users visited in the last hour