Question: snps/indels with individual genotypes from 1000 genomes ftp site
1
gravatar for lait
2.1 years ago by
lait150
lait150 wrote:

Sorry if this might be a trivial question!

I read a lot about this until I got lost. I need to download wgs VCF file from the 1000 genomes ftp site. I need the snps (snvs and indels), most importantly, I need to have the individual genotypes of all the persons involved.

so for example, this file :

ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/ALL.wgs.phase3_shapeit2_mvncall_integrated_v5b.20130502.sites.vcf.gz

which was referenced many times on biostars, does not contain individual genotypes. I need something similar to what those files contain. Is there one global file containing snps/indels for wgs data including genotypes of the various samples ?

thanks!

ftp genotypes 1000 genomes vcf • 1.0k views
ADD COMMENTlink modified 2.1 years ago by Kevin Blighe65k • written 2.1 years ago by lait150
4
gravatar for Kevin Blighe
2.1 years ago by
Kevin Blighe65k
Kevin Blighe65k wrote:

You can download the entire data per chromosome (chr1-22 & chrX) —including individual genotypes for both indels and SNPs— using this code:

prefix="ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/ALL.chr" ;

suffix=".phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.vcf.gz" ;

for chr in {1..22} X; do
    wget $prefix$chr$suffix $prefix$chr$suffix.tbi ;
done

From: Produce PCA bi-plot for 1000 Genomes Phase III in VCF format

Kevin

ADD COMMENTlink written 2.1 years ago by Kevin Blighe65k

I can't get the files, it says the host is not resolvable. I tried also from the NCBI website, none of the pages can be opened. Is there another way to download the human vcf files directly from the terminal?

ADD REPLYlink written 2.1 years ago by marongiu.luigi520

I can connect - I did it just now 10 seconds ago. To where are you downloading the data?

ADD REPLYlink modified 2.1 years ago • written 2.1 years ago by Kevin Blighe65k

I tried several, the only page that opened is http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/. The other were ftp://ftp.ncbi.nih.gov/snp/organisms/human_9606/VCF/ and from the links provided from https://www.ncbi.nlm.nih.gov/variation/docs/human_variation_vcf/, http://www.internationalgenome.org/data#download.

ADD REPLYlink written 2.1 years ago by marongiu.luigi520

The direct links are: ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/ALL.chr1.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.vcf.gz
ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/ALL.chr2.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.vcf.gz
ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/ALL.chr3.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.vcf.gz
ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/ALL.chr4.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.vcf.gz
ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/ALL.chr5.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.vcf.gz
ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/ALL.chr6.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.vcf.gz
ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/ALL.chr7.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.vcf.gz
et cetera

...and, the tab-index files:

ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/ALL.chr1.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.vcf.gz.tbi
ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/ALL.chr2.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.vcf.gz.tbi
ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/ALL.chr3.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.vcf.gz.tbi
et cetera

ADD REPLYlink modified 2.1 years ago by genomax90k • written 2.1 years ago by Kevin Blighe65k

thank you, but also these are giving me time out errors. But it worked really fast with $ wget http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/ALL.chr21.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.vcf.gz.tbi. Is there some problems with the server? Are these sites pointing at the same data?

ADD REPLYlink written 2.1 years ago by marongiu.luigi520

That file is a tab-index file, which is very small; so, it will download very quickly in most places unless you are using a dial-up modem of 7.5kbps (or less).

Let's just try chr1 variants, first:

wget ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/ALL.chr1.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.vcf.gz
ADD REPLYlink written 2.1 years ago by Kevin Blighe65k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 904 users visited in the last hour