How to use 1000 Genomes data for LDheatmap package in R
1
0
Entering edit mode
6.5 years ago
mqzhu • 0

I am trying to visualize LD blocks within 1Mb flanking a SNP. And I don't want to use Haploview because it uses Hap Map 3 (build 17 assembly) which is quite outdated. So I downloaded SNP data from 1000 Genomes phase 3, using the online tool "VCF to PED converter". I got .ped and .info files. Then I used an R package ‘LDheatmap’ (which can calculate the LD in r^2 and can visualize LD in heatmap). But the files (.ped, .info files) from 1000 Genomes are not compatible input files for LDheatmap.

The example data set for LDheatmap, "CEUData", contains a data frame and a vector. The format is like this:

• CEUSNP: A dataframe of SNP genotypes. Each row represents an individual. Each column represents a SNP. SNP IDs are headers of each column.
• CEUDist: A vector of integers, representing SNP physical map locations on the chromosome.

Does any one know how to convert .ped and .info files from 1000 Genomes into compatible input files (dataframe and vector) for LDheatmap package in R?

SNP R lingkage ldheatmap heatmap • 3.5k views
0
Entering edit mode

0
Entering edit mode

Did you ever fix this?

0
Entering edit mode
9 months ago
Rashmi ▴ 20

Hi I am hoping you had worked this out. For others coming here, hope the following works for you.

Once you have .ped and .info file, use the second script provided in this link to create a MAP file from INFO file. https://davetang.org/muse/2016/07/28/vcf-to-ped/

Once you have your map file, you can use PLINK to get your LD matrix, which can then be plotted in R. I haven't used LDheatmap but have used LD.plot from package 'gaston' in R to plot LD.

1) to get LD matrix, name your ped file and map file with same name eg. eur.ped and eur.map

plink1.9 --file eur -r2 square --out eur

2) to draw your LD plot

R
library(gaston)
LD.plot(m, names[,2], draw.chr=TRUE, write.snp.id=TRUE)