Question: Obtain the LD matrix from a list of SNP
1
gravatar for janhuang.cn
15 months ago by
janhuang.cn90
janhuang.cn90 wrote:

I have a list of SNPs, and I would like to generate an LD matrix for it.

Below is an example. I obtained the R2 from SNAP [http://archive.broadinstitute.org/mpg/snap/ldsearch.php]. But this will take a lot of time since I have more than a couple of lists of SNPs.

I have also searched on Ensembl, but it seems that I can only search for the LD between two SNPs each time. [Example : http://www.ensembl.org/Homo_sapiens/Variation/HighLD?db=core;r=7:127081784-127082784;v=rs2283094;vdb=variation;vf=1684883;second_variant_name=rs2283095]

I have also looked at the LDheatmap package in R. But it seems the calculation of LD is based on a SNP information from a study, i.e. one need to provide the genotype of each participants. The snpStats package in Bioconductor seems to serve the same purpose. So these two methods do not seem to fit my situation.

Is there any way I could extract the LD matrix (R2) for CEU population, based on HapMap 22 dataset or 1000G dataset? (Considering I have a lot of lists of SNPs)

Example:

SNP
rs2283094
rs2283095
rs6467111

LD matrix (R2)
                     rs2283094    rs2283095    rs6467111
    rs2283094            1          0.143         0.042 
    rs2283095          0.143          1           0.297 
    rs6467111          0.042        0.297           1
ld snp • 2.3k views
ADD COMMENTlink modified 5 months ago by zx87545.3k • written 15 months ago by janhuang.cn90
1
gravatar for aays
15 months ago by
aays140
Canada
aays140 wrote:

If you'd like to use data from the 1000 Genomes project, it's possible to download VCF files and use PLINK 1.9, an extremely fast command line tool for working with genomic data. Briefly, the workflow would be something along the lines of:

  1. Use --make-bed to create binary fileset from the VCF;
  2. Create a file listing all the variants of interest;
  3. Use PLINK's LD functions (namely --r2) on the binary fileset, providing your variant list as input to the --ld-snp-list argument.

PLINK primarily allows for calculation of the r2 statistic, but can can also return pairwise D or D' if specified. These can be returned in either table or matrix format. Also - I see that you're also looking for SNPs where r2 > 0.8. In this case, you can use the --ld-window-r2 argument in the --r2 command to filter out any SNP pairs with r2 below that value.

ADD COMMENTlink written 15 months ago by aays140

I downloaded the 1000 Genomes phase 3 VCF files from this FTP folder [ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/]. I wonder if the one named " " covers chr 1-22? I suspect "wgs" means whole genome, but it seems to me the file size not that big, comparing to the separated files of chr1 to chr22.

I also try plink on my PC, I already put the plink file in the same folder where I store the VCF file. Then I use this command line

plink --vcf ALL.chr22.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.vcf --make-bed --out binary_fileset

It returns error: Failed to open ALL.chr22.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.vcf

Any idea what is wrong?

ADD REPLYlink modified 14 months ago • written 14 months ago by janhuang.cn90
0
gravatar for Emily_Ensembl
15 months ago by
Emily_Ensembl16k
EMBL-EBI
Emily_Ensembl16k wrote:

You can get linkage across a region in Ensembl.

ADD COMMENTlink modified 15 months ago • written 15 months ago by Emily_Ensembl16k

But the SNPs I have may not be on the same chromosome.

ADD REPLYlink written 15 months ago by janhuang.cn90

Then they won't be in LD

ADD REPLYlink written 15 months ago by Emily_Ensembl16k

Can I get the linkage across the entire chromosome in Ensembl? It seems that it can not display since "The region you have selected is too large to display linkage data; a maximum region of 75kb is allowed. Please change the region using the navigation controls above."

ADD REPLYlink modified 15 months ago • written 15 months ago by janhuang.cn90

The web-page has a limit of 75kb and the REST-API endpoint. In principle, there is no size limit on getting LD using the Perl API, but in practice the load would be so much that your computer would just explode.

The limits match up to what we actually think will be in LD. Beyond that, variants are so unlikely to be in LD then there's no point in pretending to calculate it.

ADD REPLYlink written 15 months ago by Emily_Ensembl16k

Thank you. It is good to know this. But in my case, I have a list of SNPs (may be from different chromosomes), and I want to obtain the LD between them (those in the same chromosomes). So I think I have to obtain this from a comprehensive source, since I have too many SNPs, I do not want to manually search it on any website.

ADD REPLYlink written 15 months ago by janhuang.cn90

How big is your list? Do you just want to compare them to each other? You could use the Perl or REST APIs to get pairwise LD for each (perhaps with a chromosome filter) and print to a table of your own design.

ADD REPLYlink written 15 months ago by Emily_Ensembl16k

If I have only two lists, each with less than 10 SNPs, I can manually search on Ensemble. But I have more than a hundred lists, and each list may have different number of SNPs. I have two purposes: 1) To generate the LD matrix for each list of SNPs; 2) To identify SNPs with R2>= 0.8 within each list.

ADD REPLYlink written 15 months ago by janhuang.cn90
0
gravatar for zx8754
5 months ago by
zx87545.3k
London
zx87545.3k wrote:

We can use LDlink, either use their website LDlink.

Or use commandline, example below. More info about programmatic access see here:

To get D' prime:

$ curl -k -X GET 'https://analysistools.nci.nih.gov/LDlink/LDlinkRest/ldmatr ix?snps=rs2283094%0Ars2283095%0Ars6467111&pop=CEU&r2_d=d'
RS_number       rs2283094       rs2283095       rs6467111
rs2283094       1.0             1.0             1.0
rs2283095       1.0             1.0             1.0
rs6467111       1.0             1.0             1.0

And for R2:

$ curl -k -X GET 'https://analysistools.nci.nih.gov/LDlink/LDlinkRest/ldmatr ix?snps=rs2283094%0Ars2283095%0Ars6467111&pop=CEU&r2_d=r'
RS_number       rs2283094       rs2283095       rs6467111
rs2283094       1.0             0.188           0.329
rs2283095       0.188           1.0             0.062
rs6467111       0.329           0.062           1.0
ADD COMMENTlink written 5 months ago by zx87545.3k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2147 users visited in the last hour