2
1
Entering edit mode
6.8 years ago
evo_genomics ▴ 60

How can I download genotype of specific snp (snp of coding region) for African population from 1000 Genome?

Thanks

SNP genotype population Genetics • 5.4k views
2
Entering edit mode
6.8 years ago
wangyi2412 ▴ 230

visit ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/ for the actual data.

There are also sample_population relationship description on the same ftp site. I don't have access to my record of the specific dir right now, but just browsing the site to see the docs will find it without much effort.

Hope this would help.

0
Entering edit mode

And you can check ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/current.tree if you want to find other things.

2
Entering edit mode
6.8 years ago
rbagnall ★ 1.7k

You can use tabix if you prefer not to download the large vcf files of the actual data.

To download a single snp, lets say chr6 nucleotide position 7580958 (1 based numbering of GRCh 37 from the 1000 Genomes phase 3 data). Format is: tabix name-of-vcf-file chr:start-end

tabix ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/ALL.wgs.phase3_shapeit2_mvncall_integrated_v5.20130502.sites.vcf.gz 6:7580958-7580959
6    7580958    rs2076299    A    G    100    PASS    AC=1018;AF=0.203275;AN=5008;NS=2504;DP=21936;EAS_AF=0.2867;AMR_AF=0.1772;AFR_AF=0.3139;EUR_AF=0.0358;SAS_AF=0.1585;AA=A|||


So the African allele frequency of rs2076299 in the 1000 Genomes data is AFR_AF=0.3139

0
Entering edit mode

Ah, now I see I have shown how to get the allele frequency, when 'genotypes' were asked for. You can still use tabix. You will need to retrieve information for the chromosome-specific vcf files of the 1000 Genomes data, which contain genotypes. (note the ALL.chr.6. bit in the file path. Change this to your chromosome number of choice)

tabix -h ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/ALL.chr6.phase3_shapeit2_mvncall_integrated_v5.20130502.genotypes.vcf.gz 6:7580958-7580959


In the above example, I have included -h option, which prints out the vcf header, including the sample IDs (e.g. NA21122 NA21123 NA21124 NA21125, etc). After the header lines is the variant information, including genotypes:

6    7580958    rs2076299    A    G    100    PASS    AC=1018;AF=0.203275;AN=5008;NS=2504;DP=21936;EAS_AF=0.2867;AMR_AF=0.1772;AFR_AF=0.3139;EUR_AF=0.0358;SAS_AF=0.1585;AA=A|||    GT0|0    0|0    0|0    0|0    0|1    0|1    0|0    0|0    0|0    0|0    0|0    0|1    0|0    0|0    0|0    0|0    0|0    0|0    0|0    0|0
...etc


Now you need to know the ethnicity of the sample IDs and you can find that information in this excel file:

http://www.1000genomes.org/sites/1000genomes.org/files/documents/20101214_1000genomes_samples.xls

From this file I can see that samples NA19092 to NA19266 are YRI (Yoruba in Ibadan, Nigeria).

0
Entering edit mode

Thank you