Question: How can I download snp genotype file from 1000 Genome?
1
gravatar for evo_genomics
5.7 years ago by
evo_genomics60
Pakistan
evo_genomics60 wrote:

How can I download genotype of specific snp (snp of coding region) for African population from 1000 Genome?

Thanks

ADD COMMENTlink modified 5.7 years ago by rbagnall1.7k • written 5.7 years ago by evo_genomics60
2
gravatar for wangyi2412
5.7 years ago by
wangyi2412220
China
wangyi2412220 wrote:

visit ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/ for the actual data.  

There are also sample_population relationship description on the same ftp site. I don't have access to my record of the specific dir right now, but just browsing the site to see the docs will find it without much effort. 

Hope this would help. 

 

ADD COMMENTlink modified 5.7 years ago • written 5.7 years ago by wangyi2412220

And you can check ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/current.tree if you want to find other things.

ADD REPLYlink written 5.7 years ago by Tommy Carstensen210
2
gravatar for rbagnall
5.7 years ago by
rbagnall1.7k
Australia
rbagnall1.7k wrote:

You can use tabix if you prefer not to download the large vcf files of the actual data.

To download a single snp, lets say chr6 nucleotide position 7580958 (1 based numbering of GRCh 37 from the 1000 Genomes phase 3 data). Format is: tabix name-of-vcf-file chr:start-end

tabix ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/ALL.wgs.phase3_shapeit2_mvncall_integrated_v5.20130502.sites.vcf.gz 6:7580958-7580959
6    7580958    rs2076299    A    G    100    PASS    AC=1018;AF=0.203275;AN=5008;NS=2504;DP=21936;EAS_AF=0.2867;AMR_AF=0.1772;AFR_AF=0.3139;EUR_AF=0.0358;SAS_AF=0.1585;AA=A|||

So the African allele frequency of rs2076299 in the 1000 Genomes data is AFR_AF=0.3139

ADD COMMENTlink modified 12 months ago by RamRS30k • written 5.7 years ago by rbagnall1.7k

Ah, now I see I have shown how to get the allele frequency, when 'genotypes' were asked for. You can still use tabix. You will need to retrieve information for the chromosome-specific vcf files of the 1000 Genomes data, which contain genotypes. (note the ALL.chr.6. bit in the file path. Change this to your chromosome number of choice)

tabix -h ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/ALL.chr6.phase3_shapeit2_mvncall_integrated_v5.20130502.genotypes.vcf.gz 6:7580958-7580959

In the above example, I have included -h option, which prints out the vcf header, including the sample IDs (e.g. NA21122 NA21123 NA21124 NA21125, etc). After the header lines is the variant information, including genotypes:

6    7580958    rs2076299    A    G    100    PASS    AC=1018;AF=0.203275;AN=5008;NS=2504;DP=21936;EAS_AF=0.2867;AMR_AF=0.1772;AFR_AF=0.3139;EUR_AF=0.0358;SAS_AF=0.1585;AA=A|||    GT0|0    0|0    0|0    0|0    0|1    0|1    0|0    0|0    0|0    0|0    0|0    0|1    0|0    0|0    0|0    0|0    0|0    0|0    0|0    0|0
...etc

Now you need to know the ethnicity of the sample IDs and you can find that information in this excel file:

http://www.1000genomes.org/sites/1000genomes.org/files/documents/20101214_1000genomes_samples.xls

From this file I can see that samples NA19092 to NA19266 are YRI (Yoruba in Ibadan, Nigeria).

ADD REPLYlink modified 12 months ago by RamRS30k • written 5.7 years ago by rbagnall1.7k

    Thank you

ADD REPLYlink written 5.7 years ago by evo_genomics60
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2276 users visited in the last hour