Question: 1000 Genomes Indel/Snps File
1
gravatar for Mpiro
9.3 years ago by
Mpiro40
United States
Mpiro40 wrote:

Is there any place to download 1000 genome SNPs and Indels data in a single merged file for each pilot (pilot 1 , 2 , and 3) ? I am trying to use GATK and thought to use 1KG indels and SNPs vcf files to realign/recalibrate my calls.

I appreciate your input.

genome indel gatk • 5.2k views
ADD COMMENTlink written 9.3 years ago by Mpiro40

Thank you all ... appreciate your input

ADD REPLYlink written 9.3 years ago by Mpiro0
1

mpiro, it's nice that you wish to thank people for their input, but please do it as a comment to the question, or their answer - don't generate an additional 'answer'. If you feel one of these replies really helped you make it the 'Accepted answer'

ADD REPLYlink written 9.3 years ago by Daniel Swan13k
3
gravatar for Michael.James.Clark
9.3 years ago by
Palo Alto
Michael.James.Clark560 wrote:

Have you checked the 1000 Genomes site?

They had a major release of all the merged phase 1 data in November 2010 which you can obtain VCF files from the ftp link here.

ADD COMMENTlink modified 8 months ago by RamRS27k • written 9.3 years ago by Michael.James.Clark560
2
gravatar for Jorge Amigo
9.3 years ago by
Jorge Amigo11k
Santiago de Compostela, Spain
Jorge Amigo11k wrote:

the only completely merged set 1000 genomes has released is their final release of 629 individuals, named Phase I, which currently appears as the latest news on their project's home page. it should be downloaded from EBI or NCBI depending on your current location.

the summary of all 3 pilots has been put in a single place (again, mirrored on both EBI and NCBI), although they haven't been merged into a single compressed vcf file as the final release. instead, individuals' data merged by their population of origin can be downloaded. you will find the pilot 1 data under the "low coverage" folder (~180 individuals sequenced at ~2-4x), the pilot 2 data under the "trio" folder (6 individuals making 2 trios sequenced at ~20-60x), and the pilot 3 data under the "exon" folder (~1000 genes from ~900 individuals sequenced at ~50x).

ADD COMMENTlink written 9.3 years ago by Jorge Amigo11k
3

also the most upto date version of the pilot data ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/pilot_data/paper_data_sets/a_map_of_human_variation/ this includes dbsnp 132 rs numbers for the pilot snps

ADD REPLYlink written 9.3 years ago by Laura1.7k
1

not exactly redundant, but complementary ;) if what mpiro wants is a single merged file, sure the Phase I data you mentioned should do. but if what he needs is data precisely from the pilot studies I just wanted to point out that these have been released in a grouped-by-population basis, and that they could be downloaded from 2 different main mirrors. also, roughly describing what the 3 pilot studies are about may be of some interest for any BioStar reader arriving to this post who may not be aware of the nature of those pilot studies.

ADD REPLYlink written 9.3 years ago by Jorge Amigo11k

Seems redundant with my answer.

ADD REPLYlink written 9.3 years ago by Michael.James.Clark560

This isn't the phase 1 release. It is a early release of main project snps based on the august alignments generated.

ADD REPLYlink written 9.3 years ago by Laura1.7k

you're right Laura, it is true that Phase I is not yet finished, I just meant to say that it was part of the final data and not from the pilot studies, so thanks for clarifying. the readme file describes this data as "An interim analysis of Phase I data was carried out based on the 2010.08.04 sequence index, which included 629 sequenced samples.". and thank you also for pointing out the most up-to-date release of the pilot studies. I wasn't aware of such update, as they have published it on that new "paper_data_sets" ftp folder insted of the "release" folder I was periodically checking.

ADD REPLYlink written 9.3 years ago by Jorge Amigo11k
2
gravatar for Laura
9.3 years ago by
Laura1.7k
Cambridge UK
Laura1.7k wrote:

There are no merged set of variants for each pilot project

I would suggest downloading the vcf files you want from ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/pilot_data/paper_data_sets/a_map_of_human_variation/ remembering these are mapped to NCBI36 not GRCh37 and then using a tool like this one to merge the files together.

ADD COMMENTlink modified 8 months ago by RamRS27k • written 9.3 years ago by Laura1.7k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1070 users visited in the last hour