Question

How Does Ld In Plink Work?

5

Entering edit mode

11.7 years ago

Javier2013 ▴ 130

Hi everyone:

I am working with genome wide data (SNPs) using plink and want to plot and LD decay graph (something like this http://postimg.org/image/5bkdju6jx/)

I know that several authors use the --r2 command implemented in plink but I do not understand how to make a plot like this with the plink output.

Here is the output im getting: http://postimg.org/image/87hq8nodp/

I was doing some testing with the LWK population (Hapmap 3) using the default values of the --r2 command.

Can someone explain me how to interpret these results?

In particular I don't know what this command means: --ld-window, not even after reading the description in the plink website.

Thank you in advance.

plink ld snps • 24k views

ADD COMMENT • link updated 11.4 years ago by msutada • 0 • written 11.7 years ago by Javier2013 ▴ 130

0

Entering edit mode

fyi, it's not a big deal with small windows, but if you're doing any heavy-duty --r2 runs, PLINK 1.9 is much faster than PLINK 1.07.

ADD REPLY • link 11.6 years ago by chrchang523 11k

0

Entering edit mode

hello, i need help in creating map and ped files in finding linkage disequilibrium for the data from dbsnp, am confused because in dbsnp some snps have 2-3 allele changes for same physical position and few snps have many physical positions,so am confused in creating those files. please help me

ADD REPLY • link 7.3 years ago by 14311a2301 • 0

score 15 · Answer 1 · 2013-10-24

So there are various plink options

--ld-window-kb 1000 # This means only assess LD between SNPs within a specified window i.e. here within 1000 kb. With whole genome data, there are so many SNPs, you don't want to compare every snp to every other snp - the outfile would be too big. So this specifies to only compare ones within a certain distance. You should set this based on your knowledge of your species. i.e. if LD decays very quickly then this can be smaller. If LD is long in your taxon, this should be longer. In the example you provide this is what they mean by pairs of SNPs less than 70kb apart, i.e. they set --ld-window-kb 70

 --ld-window-r2 0.2  # This means only output the r2 values greater than 0.2. (default is 0.2) So for an LD decay plot you may want to change that to 0 otherwise your averages may be off - see below (but this will make the file enormous, so you might want to do everything separately for each chromosome).

To get the the LD decay plot you need to do something like the following. First of all - generate a text file with two columns - the first is the distance between the two SNPs (i.e. BP_B - BP_A, so if the first snp is at position 1000 on the chromosome, and the second is at position 2150, the distance is 1150 ) and the second is the corresponding R2 value (yoru last column). So you get a file with R2 values for SNPs certain distances apart. distance R2 5 0.2 5 0.3 67 0.2 67 0.4 67 0.5

Then for each distance apart, calculate an average R2 (you need to generate a script to do this e.g. in python/perl) i.e. distance averageR2 5 0.25 67 0.3667

Then plot that file to get the LD decay.

Depending on how many points you have, you may want to using a sliding window average script for the plot. In the example, I think this is what they did - in terms of 1kb intervals, i.e. they did a sliding window - with a window size of 1kb and step size of 1kb, and calculated average r2.

score 5 · Answer 2 · 2013-12-06

5

Entering edit mode

11.6 years ago

Javier2013 ▴ 90

Hi:

If anyone is interested I finally got a logical result. My problem was that my genetic map was not accurate; thus my LD was completely wrong.

The parameters I used in plink were these:

--r2 --ld-window-r2 0 --r2 --ld-window 99999 --ld-window-kb 70

Then I just this script to plot the result (in R):

LD <- read.table("your_plinkresult.ld", header = T) ##read your .ld data

LD$distancekb <- with (LD, LD$BP_B-LD$BP_A)/1000 ## the distance between snp1 and snp2 in kb

LD$grp <- cut(LD$distancekb, 0:70) ## bin 70kb

r2means <- with (LD, tapply(LD$R2, LD$grp, FUN = mean)) ##r2 mean every 1kb

Javier

ADD COMMENT • link 11.6 years ago by Javier2013 ▴ 90

0

Entering edit mode

So --ld-window-kb specifies the window in which to compare snps. What is --ld-window 99999?

ADD REPLY • link 7.4 years ago by firestar ★ 1.7k

score 1 · Answer 3 · 2013-10-25

1

Entering edit mode

11.7 years ago

Javier2013 ▴ 130

Hi, I just finished with the script but the values that I get are kind of odd. For instance, this is an african hapmap3 sample and by 70kb I still get an average r2 of 0.26 which is a lot, right? Could this be because it is a prunned data set?

Here is a picture of the script plus the results... http://s21.postimg.org/xag70922v/Screen_Shot_2013_10_26_at_2_30_18_AM.png

Thanks again!

ADD COMMENT • link 11.7 years ago by Javier2013 ▴ 130

0

Entering edit mode

I tried using a non-prunned data set but I get the same results. I guess then its something with the parameters I am using. I would really appreciated it if someone could help me out with this.

Here is my parameter file: Options in effect: --noweb --bfile lwk --r2 --ld-window-kb 70 --ld-window-r2 0 --ld-window 20 --out lwk_70kb_20spns

Thanks!

ADD REPLY • link 11.7 years ago by Javier2013 ▴ 130

0

Entering edit mode

Sorry for my absence - I haven't been checking my email. I wonder if its the --ld-window 20 command. With this you are saying that you only want to analyse SNPS that are not more 20 SNPs apart. I'm not sure why you would want to do that for your task you are trying to do. Try removing that option but keeping all the others. Hopefully that will fix it.

ADD REPLY • link 11.7 years ago by Clare ▴ 180

0

Entering edit mode

Hi Clare,

I just tried rerunning the test without that option but the results are pretty much the same! I do not understand why. I makes no sense given the fact that it is an African population. I do not know what else to do.

ADD REPLY • link 11.6 years ago by Javier2013 ▴ 90

score 0 · Answer 4 · 2014-01-30

0

Entering edit mode

11.4 years ago

msutada • 0

Hi Javier,

After get r2means then, which function do you use to make the plot as mention above?

Thank you!!

ADD COMMENT • link 11.4 years ago by msutada • 0

0

Entering edit mode

I only used R to plot the result with the script mentioned above.

ADD REPLY • link 11.4 years ago by GabrielMontenegro ▴ 680