1000 Genomes Indel/Snps File
3
1
Entering edit mode
11.3 years ago
Mpiro ▴ 40

Is there any place to download 1000 genome SNPs and Indels data in a single merged file for each pilot (pilot 1 , 2 , and 3) ? I am trying to use GATK and thought to use 1KG indels and SNPs vcf files to realign/recalibrate my calls.

I appreciate your input.

genome indel gatk • 5.9k views
ADD COMMENT
0
Entering edit mode

Thank you all ... appreciate your input

ADD REPLY
1
Entering edit mode

mpiro, it's nice that you wish to thank people for their input, but please do it as a comment to the question, or their answer - don't generate an additional 'answer'. If you feel one of these replies really helped you make it the 'Accepted answer'

ADD REPLY
3
Entering edit mode
11.3 years ago

Have you checked the 1000 Genomes site?

They had a major release of all the merged phase 1 data in November 2010 which you can obtain VCF files from the ftp link here.

ADD COMMENT
2
Entering edit mode
11.3 years ago

the only completely merged set 1000 genomes has released is their final release of 629 individuals, named Phase I, which currently appears as the latest news on their project's home page. it should be downloaded from EBI or NCBI depending on your current location.

the summary of all 3 pilots has been put in a single place (again, mirrored on both EBI and NCBI), although they haven't been merged into a single compressed vcf file as the final release. instead, individuals' data merged by their population of origin can be downloaded. you will find the pilot 1 data under the "low coverage" folder (~180 individuals sequenced at ~2-4x), the pilot 2 data under the "trio" folder (6 individuals making 2 trios sequenced at ~20-60x), and the pilot 3 data under the "exon" folder (~1000 genes from ~900 individuals sequenced at ~50x).

ADD COMMENT
3
Entering edit mode

also the most upto date version of the pilot data ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/pilot_data/paper_data_sets/a_map_of_human_variation/ this includes dbsnp 132 rs numbers for the pilot snps

ADD REPLY
1
Entering edit mode

not exactly redundant, but complementary ;) if what mpiro wants is a single merged file, sure the Phase I data you mentioned should do. but if what he needs is data precisely from the pilot studies I just wanted to point out that these have been released in a grouped-by-population basis, and that they could be downloaded from 2 different main mirrors. also, roughly describing what the 3 pilot studies are about may be of some interest for any BioStar reader arriving to this post who may not be aware of the nature of those pilot studies.

ADD REPLY
0
Entering edit mode

Seems redundant with my answer.

ADD REPLY
0
Entering edit mode

This isn't the phase 1 release. It is a early release of main project snps based on the august alignments generated.

ADD REPLY
0
Entering edit mode

you're right Laura, it is true that Phase I is not yet finished, I just meant to say that it was part of the final data and not from the pilot studies, so thanks for clarifying. the readme file describes this data as "An interim analysis of Phase I data was carried out based on the 2010.08.04 sequence index, which included 629 sequenced samples.". and thank you also for pointing out the most up-to-date release of the pilot studies. I wasn't aware of such update, as they have published it on that new "paper_data_sets" ftp folder insted of the "release" folder I was periodically checking.

ADD REPLY
2
Entering edit mode
11.3 years ago
Laura ★ 1.8k

There are no merged set of variants for each pilot project

I would suggest downloading the vcf files you want from ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/pilot_data/paper_data_sets/a_map_of_human_variation/ remembering these are mapped to NCBI36 not GRCh37 and then using a tool like this one to merge the files together.

ADD COMMENT

Login before adding your answer.

Traffic: 934 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6