Question: [SOLVED] Get LD data for any two SNPs
0
gravatar for Mike Dacre
9 weeks ago by
Mike Dacre20
Stanford, CA
Mike Dacre20 wrote:

Is there a good way to take any two SNPs and pull out the LD between them, particularly the R-squared and the directionality of the linkage (e.g. A in SNP1 occurs with G in SNP2 95% of the time)?

Obviously I can do this manually, but I am wanting to do it for a list of several thousand SNPs, so I am hoping for a scalable solution. Right now I can't find anything, and it looks like I will have to come up with my own solution using vcftools and the 1000 genomes data.

Thanks!

ADD COMMENTlink modified 7 weeks ago • written 9 weeks ago by Mike Dacre20

It turns out that LDLink does this beautifully for a single pair of rsids, but it doesn't seem to work in batch mode: https://analysistools.nci.nih.gov/LDlink/

ADD REPLYlink written 9 weeks ago by Mike Dacre20
1

Maybe it's possible that you could get your results from LDlink with some scripting as well, possibly requires parsing the html for ex.you can construct URLs programmatically pretty easily https://analysistools.nci.nih.gov/LDlink/?var1=rs1042779&var2=rs6792369&pop=YRI%2BLWK%2BGWD%2BMSL%2BESN%2BASW%2BACB%2BMXL%2BPUR%2BCLM%2BPEL%2BCHB%2BJPT%2BCHS%2BCDX%2BKHV%2BCEU%2BTSI%2BFIN%2BGBR%2BIBS%2BGIH%2BPJL%2BBEB%2BSTU%2BITU&tab=ldpair

ADD REPLYlink written 9 weeks ago by cmdcolin620

Yes! It looks like that does work, I am not sure how many queries their API can tolerate, but I am going to try this tonight to see. I also want to compare this to running the calculations with plink/vcftools to see which is faster/more stable. Thanks!

ADD REPLYlink written 9 weeks ago by Mike Dacre20

plink1.9 offers this functionality, as answered by @chrchang523

In case anyone else wants to do this, I wrote a little package based on plink and LDlink that allows many-to-many LD lookup. Basically, provided two SNP lists, it creates a list of SNP LD pairs between each SNP in the first list and every SNP in the second list, filtered by distance and R2. Provided a first list of 40,000 SNPs and a second list of ~10million risk alleles it runs in a couple of hours.

The output includes phase SNP data to ask the question: 'given Allele X in SNP A, what is the allele in SNP B' for every single possible pair.

All of this is just done by some basic math, running plink a bunch of times, and parsing the output. Hopefully it helps someone else.

ADD REPLYlink written 7 weeks ago by Mike Dacre20
2
gravatar for chrchang523
9 weeks ago by
chrchang5232.3k
United States
chrchang5232.3k wrote:

"plink --r2 in-phase" provides both r-squared and directionality; see https://www.cog-genomics.org/plink/1.9/ld#r .

ADD COMMENTlink written 9 weeks ago by chrchang5232.3k
0
gravatar for cmdcolin
9 weeks ago by
cmdcolin620
United States
cmdcolin620 wrote:

The ensembl REST API offers this function http://rest.ensembl.org/documentation/info/ld_pairwise_get

ADD COMMENTlink written 9 weeks ago by cmdcolin620

Note that you can use 1000GENOMES:phase_3:ALL for example http://rest.ensembl.org/ld/human/pairwise/rs6792369/rs1042779?content-type=application/json;population_name=1000GENOMES:phase_3:ALL instead of a specific population as well.

ADD REPLYlink written 9 weeks ago by cmdcolin620

Thanks! That is great, but unfortunately it doesn't include the directionality, what I really need to know is what SNP2 is given some value for SNP1 (i.e. SNP2 is T 90% of the time when SNP1 is G)

ADD REPLYlink written 9 weeks ago by Mike Dacre20
1

I see! Perhaps the ensembl team would be interested in adding that function.

ADD REPLYlink written 9 weeks ago by cmdcolin620
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 516 users visited in the last hour