Question: snps/indels with individual genotypes from 1000 genomes ftp site
1
gravatar for lait
15 months ago by
lait140
lait140 wrote:

Sorry if this might be a trivial question!

I read a lot about this until I got lost. I need to download wgs VCF file from the 1000 genomes ftp site. I need the snps (snvs and indels), most importantly, I need to have the individual genotypes of all the persons involved.

so for example, this file :

ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/ALL.wgs.phase3_shapeit2_mvncall_integrated_v5b.20130502.sites.vcf.gz

which was referenced many times on biostars, does not contain individual genotypes. I need something similar to what those files contain. Is there one global file containing snps/indels for wgs data including genotypes of the various samples ?

thanks!

ftp genotypes 1000 genomes vcf • 685 views
ADD COMMENTlink modified 15 months ago by Kevin Blighe51k • written 15 months ago by lait140
4
gravatar for Kevin Blighe
15 months ago by
Kevin Blighe51k
Kevin Blighe51k wrote:

You can download the entire data per chromosome (chr1-22 & chrX) —including individual genotypes for both indels and SNPs— using this code:

prefix="ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/ALL.chr" ;

suffix=".phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.vcf.gz" ;

for chr in {1..22} X; do
    wget $prefix$chr$suffix $prefix$chr$suffix.tbi ;
done

From: Produce PCA bi-plot for 1000 Genomes Phase III in VCF format

Kevin

ADD COMMENTlink written 15 months ago by Kevin Blighe51k

I can't get the files, it says the host is not resolvable. I tried also from the NCBI website, none of the pages can be opened. Is there another way to download the human vcf files directly from the terminal?

ADD REPLYlink written 15 months ago by marongiu.luigi420

I can connect - I did it just now 10 seconds ago. To where are you downloading the data?

ADD REPLYlink modified 15 months ago • written 15 months ago by Kevin Blighe51k

I tried several, the only page that opened is http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/. The other were ftp://ftp.ncbi.nih.gov/snp/organisms/human_9606/VCF/ and from the links provided from https://www.ncbi.nlm.nih.gov/variation/docs/human_variation_vcf/, http://www.internationalgenome.org/data#download.

ADD REPLYlink written 15 months ago by marongiu.luigi420

The direct links are: ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/ALL.chr1.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.vcf.gz
ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/ALL.chr2.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.vcf.gz
ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/ALL.chr3.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.vcf.gz
ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/ALL.chr4.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.vcf.gz
ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/ALL.chr5.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.vcf.gz
ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/ALL.chr6.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.vcf.gz
ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/ALL.chr7.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.vcf.gz
et cetera

...and, the tab-index files:

ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/ALL.chr1.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.vcf.gz.tbi
ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/ALL.chr2.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.vcf.gz.tbi
ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/ALL.chr3.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.vcf.gz.tbi
et cetera

ADD REPLYlink modified 15 months ago by genomax73k • written 15 months ago by Kevin Blighe51k

thank you, but also these are giving me time out errors. But it worked really fast with $ wget http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/ALL.chr21.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.vcf.gz.tbi. Is there some problems with the server? Are these sites pointing at the same data?

ADD REPLYlink written 15 months ago by marongiu.luigi420

That file is a tab-index file, which is very small; so, it will download very quickly in most places unless you are using a dial-up modem of 7.5kbps (or less).

Let's just try chr1 variants, first:

wget ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/ALL.chr1.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.vcf.gz
ADD REPLYlink written 15 months ago by Kevin Blighe51k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1230 users visited in the last hour