Question

How do I compute ld blocks from the hapmap ld_data?

1

Entering edit mode

4.4 years ago

endrebak ▴ 960

When people ask for ld blocks on this forum, this link is often posted:

ftp://ftp.ncbi.nlm.nih.gov/hapmap/ld_data/latest

However, those files merely contain the correlations between snps. How do I use these to compute ld-blocks?

Would be nice if you could post an example script or invocation of a tool, not merely post a link :)

Thank you!

ld snps • 3.5k views

ADD COMMENT • link updated 2.7 years ago by Rashmi ▴ 20 • written 4.4 years ago by endrebak ▴ 960

score 6 · Accepted Answer · 2019-12-05

The contents of those files are explained in the README; however, I'm not sure how well supported is that format specification (if it is even defined as a format specification).

I have an imputed 1000 Genomes Phase III dataset on my computer that I utilise for various projects —including LD analyses— when needed. If you follow this tutorial, you'll also have this imputed dataset: Produce PCA bi-plot for 1000 Genomes Phase III - Version 2

Once you have all of those files, generating a LD heatmap for any region is as simple as this, taken from my notes:

1, extract variants overlapping the region of interest +/- 1Kbp from the 1000 Genomes Phase III data

bcftools view chr3.1kg.phase3.v5.vcf.gz -R Gene_hg19.bed | \
  bcftools annotate -Ob -I +'%CHROM:%POS:%REF:%ALT' > Region.bcf ;
bcftools index Region.bcf ;

Here, the co-ordinates of the region of interest are stored in the BED file. I also set the VCF ID field to CHROM:POS:REF:ALT if it's not already set

2, extract different populations

bcftools view -Ob --samples-file EAS.list Region.bcf > EAS.bcf ;
bcftools index EAS.bcf ;

Here, EAS.list comprises a list of sample IDs representing the East Asian (EAS) super population. I identified these from 20130606_g1k.ped

3, convert the BCF file to PLINK format (requires PLINK >= 1.9)

plink --noweb --bcf EAS.bcf \
  --keep-allele-order \
  --vcf-idspace-to _ \
  --const-fid \
  --allow-extra-chr 0 \
  --split-x b37 no-fail \
  --make-bed \
  --out EAS ;

4, export data to HaploView format

plink --noweb --bfile EAS \
  --chr 3 --from-bp 135364584 --to-bp 135399507 \
  --snps-only no-DI \
  --recodeHV \
  --out EAS_Haploview ;

5, determine haplotype blocks in HaploView

I have written the following notes:

Start HaploView. In the ‘Welcome to HaploView’ window, select the ‘LINKAGE Format’ tab. Then, click on the ‘browse’ button and select your PED file as ‘Data File’ - this is the PED file that you generated from PLINK.

Click ‘OK’.

Select the ‘LD Plot’ tab to generate a LD heatmap. To save the plot as a Portable Network Graphics (PNG) file, go to File > Export current tab to PNG

6, LD heatmap in R

Then, you can read that HaploView-exported file back into R to mess around with it:

require(png)

par(mar=c(0,0,0,0), mfrow=c(4,1))

haploblocks <- readPNG("EAS_Haploblocks.png")
plot(1:2, type='n', main="", xlab="", ylab="", xaxt="n", yaxt="n", axes=FALSE)
lim <- par()
rasterImage(haploblocks, lim$usr[1], lim$usr[3], lim$usr[2], lim$usr[4]-0.2)

require(SDMTools)
require(RColorBrewer)
points <- cbind(x=c(1.925, 1.96, 1.96, 1.925), y=c(1.5, 1.5, 1.1, 1.1))
legend.gradient(points, cols=brewer.pal(n=9, name="Greys"), title=bquote(R^2), limits=c(0,1))

NB - the gene track I added also via R, and also via readPNG(). I previously downloaded it from UCSC and saved it as a separate PNG

Kevin

score 2 · Accepted Answer · 2019-12-06

2

Entering edit mode

4.4 years ago

endrebak ▴ 960

Method: https://academic.oup.com/bioinformatics/article/32/2/283/1743626

Data: https://bitbucket.org/nygcresearch/ldetect-data/src/master/EUR/fourier_ls-all.bed