Question: [SOLVED] Get LD data for any two SNPs
0
gravatar for Mike Dacre
4 months ago by
Mike Dacre30
Stanford, CA
Mike Dacre30 wrote:

Is there a good way to take any two SNPs and pull out the LD between them, particularly the R-squared and the directionality of the linkage (e.g. A in SNP1 occurs with G in SNP2 95% of the time)?

Obviously I can do this manually, but I am wanting to do it for a list of several thousand SNPs, so I am hoping for a scalable solution. Right now I can't find anything, and it looks like I will have to come up with my own solution using vcftools and the 1000 genomes data.

Thanks!

ADD COMMENTlink modified 3 months ago • written 4 months ago by Mike Dacre30

It turns out that LDLink does this beautifully for a single pair of rsids, but it doesn't seem to work in batch mode: https://analysistools.nci.nih.gov/LDlink/

ADD REPLYlink written 4 months ago by Mike Dacre30
1

Maybe it's possible that you could get your results from LDlink with some scripting as well, possibly requires parsing the html for ex.you can construct URLs programmatically pretty easily https://analysistools.nci.nih.gov/LDlink/?var1=rs1042779&var2=rs6792369&pop=YRI%2BLWK%2BGWD%2BMSL%2BESN%2BASW%2BACB%2BMXL%2BPUR%2BCLM%2BPEL%2BCHB%2BJPT%2BCHS%2BCDX%2BKHV%2BCEU%2BTSI%2BFIN%2BGBR%2BIBS%2BGIH%2BPJL%2BBEB%2BSTU%2BITU&tab=ldpair

ADD REPLYlink written 4 months ago by cmdcolin640

Yes! It looks like that does work, I am not sure how many queries their API can tolerate, but I am going to try this tonight to see. I also want to compare this to running the calculations with plink/vcftools to see which is faster/more stable. Thanks!

ADD REPLYlink written 4 months ago by Mike Dacre30

plink1.9 offers this functionality, as answered by @chrchang523

In case anyone else wants to do this, I wrote a little package based on plink and LDlink that allows many-to-many LD lookup. Basically, provided two SNP lists, it creates a list of SNP LD pairs between each SNP in the first list and every SNP in the second list, filtered by distance and R2. Provided a first list of 40,000 SNPs and a second list of ~10million risk alleles it runs in a couple of hours.

The output includes phase SNP data to ask the question: 'given Allele X in SNP A, what is the allele in SNP B' for every single possible pair.

All of this is just done by some basic math, running plink a bunch of times, and parsing the output. Hopefully it helps someone else.

ADD REPLYlink written 3 months ago by Mike Dacre30
2
gravatar for chrchang523
4 months ago by
chrchang5232.5k
United States
chrchang5232.5k wrote:

"plink --r2 in-phase" provides both r-squared and directionality; see https://www.cog-genomics.org/plink/1.9/ld#r .

ADD COMMENTlink written 4 months ago by chrchang5232.5k
0
gravatar for cmdcolin
4 months ago by
cmdcolin640
United States
cmdcolin640 wrote:

The ensembl REST API offers this function http://rest.ensembl.org/documentation/info/ld_pairwise_get

ADD COMMENTlink written 4 months ago by cmdcolin640

Note that you can use 1000GENOMES:phase_3:ALL for example http://rest.ensembl.org/ld/human/pairwise/rs6792369/rs1042779?content-type=application/json;population_name=1000GENOMES:phase_3:ALL instead of a specific population as well.

ADD REPLYlink written 4 months ago by cmdcolin640

Thanks! That is great, but unfortunately it doesn't include the directionality, what I really need to know is what SNP2 is given some value for SNP1 (i.e. SNP2 is T 90% of the time when SNP1 is G)

ADD REPLYlink written 4 months ago by Mike Dacre30
1

I see! Perhaps the ensembl team would be interested in adding that function.

ADD REPLYlink written 4 months ago by cmdcolin640
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1357 users visited in the last hour