Question: How to get 1000 Genomes data in bulk?
0
gravatar for kynnjo
7 months ago by
kynnjo20
United States
kynnjo20 wrote:

I am looking for an efficient way to get 1000 Genomes data for ~70k dbSNP ids. (I am primarily interested in putative impact and allele frequencies.)

Is there a convenient way to do this?

A good solution would be some way to query 1000 Genomes programmatically and in bulk (as opposed to one dbSNP id at at time), but I have not found it yet.

Another possibility would be do download files from 1000 Genomes that I can process locally, but I have not been able to locate a reasonably-sized download that has the information I'm looking for. I could download all of ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502, but that could take a long time, and pretty much fill up my hard disk, without any guarantee that what I'm looking for is in that massive download.

ADD COMMENTlink modified 7 months ago by Kevin Blighe39k • written 7 months ago by kynnjo20
2

If you are able to use Amazon AWS then the data is available there and won't require a download.

ADD REPLYlink written 7 months ago by genomax63k

How to get 1000 Genomes data in bulk?

Title of this post can be refined to indicate your exact requirement.

You know how to get the data in bulk but you are looking for an efficient way to just get the data for 70k dbSNP id's that you want. Is that correct?

ADD REPLYlink modified 7 months ago • written 7 months ago by genomax63k
2
gravatar for Pierre Lindenbaum
7 months ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum118k wrote:

The following java program should do the trick:

ADD COMMENTlink written 7 months ago by Pierre Lindenbaum118k
2
gravatar for Kevin Blighe
7 months ago by
Kevin Blighe39k
Republic of Ireland
Kevin Blighe39k wrote:

Thought to give a quick answer as this thread will likely get a fair bit of traffic in the future.

If you follow steps 1-3 of my tutorial, here: Produce PCA bi-plot for 1000 Genomes Phase III in VCF format, then you'll have the data in BCF format. On my disk, the entire 1000 Genomes data (phased genotypes) in a single BCF file occupies 8.3 gigabytes. I interrogate it frequently for diverse projects.

Kevin

ADD COMMENTlink modified 7 months ago • written 7 months ago by Kevin Blighe39k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1877 users visited in the last hour