Question: snps/indels with individual genotypes from 1000 genomes ftp site
1
gravatar for lait
7 months ago by
lait130
lait130 wrote:

Sorry if this might be a trivial question!

I read a lot about this until I got lost. I need to download wgs VCF file from the 1000 genomes ftp site. I need the snps (snvs and indels), most importantly, I need to have the individual genotypes of all the persons involved.

so for example, this file :

ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/ALL.wgs.phase3_shapeit2_mvncall_integrated_v5b.20130502.sites.vcf.gz

which was referenced many times on biostars, does not contain individual genotypes. I need something similar to what those files contain. Is there one global file containing snps/indels for wgs data including genotypes of the various samples ?

thanks!

ftp genotypes 1000 genomes vcf • 395 views
ADD COMMENTlink modified 7 months ago by Kevin Blighe39k • written 7 months ago by lait130
4
gravatar for Kevin Blighe
7 months ago by
Kevin Blighe39k
Republic of Ireland
Kevin Blighe39k wrote:

You can download the entire data per chromosome (chr1-22 & chrX) —including individual genotypes for both indels and SNPs— using this code:

prefix="ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/ALL.chr" ;

suffix=".phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.vcf.gz" ;

for chr in {1..22} X; do
    wget $prefix$chr$suffix $prefix$chr$suffix.tbi ;
done

From: Produce PCA bi-plot for 1000 Genomes Phase III in VCF format

Kevin

ADD COMMENTlink written 7 months ago by Kevin Blighe39k

I can't get the files, it says the host is not resolvable. I tried also from the NCBI website, none of the pages can be opened. Is there another way to download the human vcf files directly from the terminal?

ADD REPLYlink written 7 months ago by marongiu.luigi380

I can connect - I did it just now 10 seconds ago. To where are you downloading the data?

ADD REPLYlink modified 7 months ago • written 7 months ago by Kevin Blighe39k

I tried several, the only page that opened is http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/. The other were ftp://ftp.ncbi.nih.gov/snp/organisms/human_9606/VCF/ and from the links provided from https://www.ncbi.nlm.nih.gov/variation/docs/human_variation_vcf/, http://www.internationalgenome.org/data#download.

ADD REPLYlink written 7 months ago by marongiu.luigi380

The direct links are: ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/ALL.chr1.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.vcf.gz
ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/ALL.chr2.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.vcf.gz
ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/ALL.chr3.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.vcf.gz
ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/ALL.chr4.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.vcf.gz
ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/ALL.chr5.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.vcf.gz
ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/ALL.chr6.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.vcf.gz
ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/ALL.chr7.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.vcf.gz
et cetera

...and, the tab-index files:

ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/ALL.chr1.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.vcf.gz.tbi
ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/ALL.chr2.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.vcf.gz.tbi
ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/ALL.chr3.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.vcf.gz.tbi
et cetera

ADD REPLYlink modified 7 months ago by genomax64k • written 7 months ago by Kevin Blighe39k

thank you, but also these are giving me time out errors. But it worked really fast with $ wget http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/ALL.chr21.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.vcf.gz.tbi. Is there some problems with the server? Are these sites pointing at the same data?

ADD REPLYlink written 7 months ago by marongiu.luigi380

That file is a tab-index file, which is very small; so, it will download very quickly in most places unless you are using a dial-up modem of 7.5kbps (or less).

Let's just try chr1 variants, first:

wget ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/ALL.chr1.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.vcf.gz
ADD REPLYlink written 7 months ago by Kevin Blighe39k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1638 users visited in the last hour