Question: Are There Population-Specific Snps Included In The 1000Genomes Vcf Files?
gravatar for Giovanni M Dall'Olio
6.4 years ago by
London, UK
Giovanni M Dall'Olio27k wrote:

The Phase 1 of the 1000Genomes data has published genotypes from about 1,092 individuals, and made them available in their FTP server:

So far, I was always convinced that these files contain only SNPs that are present in all the populations. For example, I thought that SNPs present only in the European population ("private" to Europeans) would be filtered out from this dataset.

The problem is that I can't find any reference or README file confirming me that the VCF files in 1000 Genomes refer only to 'cosmopolitan' SNPs. Can anyone please point me to a reference or documentation file?

Thanks in advance!

vcf 1000genomes snp • 3.0k views
ADD COMMENTlink modified 5.1 years ago by ahmedc3.ri0 • written 6.4 years ago by Giovanni M Dall'Olio27k
gravatar for zam.iqbal.genome
6.4 years ago by
United Kingdom
zam.iqbal.genome1.7k wrote:

Hi there

There is no filter to remove population specific SNPs, that's why you can't find a reference to it. It is rare to find, for example, SNPs at medium frequency in Europe and absent from all other populations. Singleton SNPs are by definition also population specific, but in the process of trying to reduce FDR, we have lower power to detect these. Figure 3 on the Phase1 paper discusses f2 variants (which occur on 2 chromosomes in the whole 1092 samples), and you can see how often these SNPs are shared between two populations.


ADD COMMENTlink written 6.4 years ago by zam.iqbal.genome1.7k

Thank you very much. I knew that in the 1000G paper they discussed about private variants, but for some reason I was convinced that these were not included in the data released in the FTP. I have probably made confusion with some of the intermediate folders that were in the FTP before the publication of the paper.

ADD REPLYlink written 6.4 years ago by Giovanni M Dall'Olio27k

No problem! I found it hard to keep up with the data except when I was tracking all the conference calls. Z

ADD REPLYlink written 6.4 years ago by zam.iqbal.genome1.7k
gravatar for pd3
6.4 years ago by
pd3350 wrote:

The VCFs contain population allele frequencies, AMR_AF, ASN_AF, AFR_AF, EUR_AF which can be used to filter sites more frequent in one population:

bcftools view -i'EUR_AF > AMR_AF & EUR_AF > ASN_AF' file.vcf.gz

(Link to bcftools.)

ADD COMMENTlink written 6.4 years ago by pd3350
gravatar for ahmedc3.ri
5.1 years ago by
United States
ahmedc3.ri0 wrote:

Hi Giovanni,

My name is Ahmed, I was working on 1000 Genomes data as the base of my Master's project and had published a paper (at Genome Biology and Evolution journal )about population stratification and inference of familial relationships through genomic data. much more data i computed that wasn't published and i still have it was continental specific SNPs (African , Euro, Asian specific SNPs), if u would like to, i can share those results with you.


contact me on my email if you still interested in such data.





ADD COMMENTlink modified 5.1 years ago • written 5.1 years ago by ahmedc3.ri0
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1489 users visited in the last hour