Question: [SOLVED] Get LD data for any two SNPs
0
gravatar for Mike Dacre
12 months ago by
Mike Dacre50
Stanford, CA
Mike Dacre50 wrote:

Is there a good way to take any two SNPs and pull out the LD between them, particularly the R-squared and the directionality of the linkage (e.g. A in SNP1 occurs with G in SNP2 95% of the time)?

Obviously I can do this manually, but I am wanting to do it for a list of several thousand SNPs, so I am hoping for a scalable solution. Right now I can't find anything, and it looks like I will have to come up with my own solution using vcftools and the 1000 genomes data.

Thanks!

ADD COMMENTlink modified 11 months ago • written 12 months ago by Mike Dacre50

It turns out that LDLink does this beautifully for a single pair of rsids, but it doesn't seem to work in batch mode: https://analysistools.nci.nih.gov/LDlink/

ADD REPLYlink written 12 months ago by Mike Dacre50
1

Maybe it's possible that you could get your results from LDlink with some scripting as well, possibly requires parsing the html for ex.you can construct URLs programmatically pretty easily https://analysistools.nci.nih.gov/LDlink/?var1=rs1042779&var2=rs6792369&pop=YRI%2BLWK%2BGWD%2BMSL%2BESN%2BASW%2BACB%2BMXL%2BPUR%2BCLM%2BPEL%2BCHB%2BJPT%2BCHS%2BCDX%2BKHV%2BCEU%2BTSI%2BFIN%2BGBR%2BIBS%2BGIH%2BPJL%2BBEB%2BSTU%2BITU&tab=ldpair

ADD REPLYlink written 12 months ago by cmdcolin760

Yes! It looks like that does work, I am not sure how many queries their API can tolerate, but I am going to try this tonight to see. I also want to compare this to running the calculations with plink/vcftools to see which is faster/more stable. Thanks!

ADD REPLYlink written 12 months ago by Mike Dacre50

plink1.9 offers this functionality, as answered by @chrchang523

In case anyone else wants to do this, I wrote a little package based on plink and LDlink that allows many-to-many LD lookup. Basically, provided two SNP lists, it creates a list of SNP LD pairs between each SNP in the first list and every SNP in the second list, filtered by distance and R2. Provided a first list of 40,000 SNPs and a second list of ~10million risk alleles it runs in a couple of hours.

The output includes phase SNP data to ask the question: 'given Allele X in SNP A, what is the allele in SNP B' for every single possible pair.

All of this is just done by some basic math, running plink a bunch of times, and parsing the output. Hopefully it helps someone else.

ADD REPLYlink written 11 months ago by Mike Dacre50
2
gravatar for chrchang523
12 months ago by
chrchang5233.1k
United States
chrchang5233.1k wrote:

"plink --r2 in-phase" provides both r-squared and directionality; see https://www.cog-genomics.org/plink/1.9/ld#r .

ADD COMMENTlink written 12 months ago by chrchang5233.1k
0
gravatar for cmdcolin
12 months ago by
cmdcolin760
United States
cmdcolin760 wrote:

The ensembl REST API offers this function http://rest.ensembl.org/documentation/info/ld_pairwise_get

ADD COMMENTlink written 12 months ago by cmdcolin760

Note that you can use 1000GENOMES:phase_3:ALL for example http://rest.ensembl.org/ld/human/pairwise/rs6792369/rs1042779?content-type=application/json;population_name=1000GENOMES:phase_3:ALL instead of a specific population as well.

ADD REPLYlink written 12 months ago by cmdcolin760

Thanks! That is great, but unfortunately it doesn't include the directionality, what I really need to know is what SNP2 is given some value for SNP1 (i.e. SNP2 is T 90% of the time when SNP1 is G)

ADD REPLYlink written 12 months ago by Mike Dacre50
1

I see! Perhaps the ensembl team would be interested in adding that function.

ADD REPLYlink written 12 months ago by cmdcolin760
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1569 users visited in the last hour