Getting The Ld Data For The 1000 Genomes
3
5
Entering edit mode
13.5 years ago
Michael 54k

Hi folks,

In another question, the OP mentioned that he was able to find some preliminary LD data for the 1000 Genomes project. I was unable to find the file that was mentioned there, but I got the impression, that LD data for the project already exists. Where is it, can you help me?

linkage genome • 9.5k views
ADD COMMENT
0
Entering edit mode

The 1000G files from the website are best approached with something that keeps the phasing information in-tact--one of the benefits of 1000G data. For that, haploxt is recommended: http://genome.sph.umich.edu/wiki/Haploxt It can be used on the files downloaded from here.

If you want to do some PLINK calculation, I can try to help by offering a script for extraction of such data from the phased file. Working on that at the moment.

ADD REPLY
0
Entering edit mode

It has been almost 3 years since this question was asked, did you manage to find LD data?

ADD REPLY
1
Entering edit mode

I calculated it myself from the phase 1 vcf, using this: A: 1000 genomes LD calculation

ADD REPLY
4
Entering edit mode
13.5 years ago

As far as I know, there is no such information currently available to the public. at least on the official ftp site (you can always check the whole up-to-date site tree here).

Ryan D commented this on the "1000 genomes LD calculation" question mentioned above:

Some data exists on this LD. I ended up pulling data from files with code like this: zcat ~/1000Genomes/2010-06/CEU/LD/xt/chr19.xt.gz | grep -w rs11671664 | awk '$4 > 0.5' rs11670375 rs11671664 0.8961 0.5673 A,G chr19:50848886 rs11671664 0.8990 0.7340 C,G rs11083777 rs11671664 0.8990 0.7340 G,G chr19:50851826 rs11671664 0.8899 0.7134 C,G chr19:50852809 rs11671664 0.8990 0.7340 A,G chr19:50853145 rs11671664 0.8990 0.7340 G,G rs4375772 rs11671664 0.8899 0.7134 C,G rs11671664 chr19:50865055 1.0000 0.6138 G,G

but unfortunately I haven't been able to find that data on the 1000 Genomes FTP server. it would definitely be very interesting to know where exactly is this data available.

ADD COMMENT
0
Entering edit mode

yep, exactly, I was desperately looking for this

ADD REPLY
4
Entering edit mode
13.5 years ago
Mary 11k

Just bumping this with a new tidbit: the 1000 Genomes paper is out today at Nature

A map of human genome variation from population-scale sequencing

I'm just starting it, but it may have some guidance on things people are interested in.

ADD COMMENT
3
Entering edit mode
13.5 years ago
Ryan D ★ 3.4k

The 1000G files from the website are best approached with something that keeps the phasing information in-tact--one of the benefits of 1000G data. For that, haploxt is recommended: genome.sph.umich.edu/wiki/Haploxt It can be used on the files downloaded from here: sph.umich.edu/csg/abecasis/MACH/download/… If you want to do some PLINK calculation, I can try to help by offering a script for extraction of such data from the phased file. Working on that at the moment.

ADD COMMENT
2
Entering edit mode

It looks like SNAP has updated with the 1000 Genomes pilot 1 data. So if you don't mind being a bit out-of-date, you can use that for quick calculations: http://www.broadinstitute.org/mpg/snap/ldsearch.php

Under SNP data set choose "1000 Genomes pilot 1".

Now we're using a ruby script which pulls the region of interest from the 1000G data and calculates LD within PLINK. Sub-optimal, but better than pilot 1 data.

ADD REPLY

Login before adding your answer.

Traffic: 2693 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6