Are There Population-Specific Snps Included In The 1000Genomes Vcf Files?
7.7 years ago

The Phase 1 of the 1000Genomes data has published genotypes from about 1,092 individuals, and made them available in their FTP server: ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase1/analysis_results/integrated_call_sets/

So far, I was always convinced that these files contain only SNPs that are present in all the populations. For example, I thought that SNPs present only in the European population ("private" to Europeans) would be filtered out from this dataset.

The problem is that I can't find any reference or README file confirming me that the VCF files in 1000 Genomes refer only to 'cosmopolitan' SNPs. Can anyone please point me to a reference or documentation file?

7.7 years ago

Hi there

There is no filter to remove population specific SNPs, that's why you can't find a reference to it. It is rare to find, for example, SNPs at medium frequency in Europe and absent from all other populations. Singleton SNPs are by definition also population specific, but in the process of trying to reduce FDR, we have lower power to detect these. Figure 3 on the Phase1 paper discusses f2 variants (which occur on 2 chromosomes in the whole 1092 samples), and you can see how often these SNPs are shared between two populations.

Zam

Thank you very much. I knew that in the 1000G paper they discussed about private variants, but for some reason I was convinced that these were not included in the data released in the FTP. I have probably made confusion with some of the intermediate folders that were in the FTP before the publication of the paper.

No problem! I found it hard to keep up with the data except when I was tracking all the conference calls. Z

7.7 years ago
pd3 ▴ 340

The VCFs contain population allele frequencies, AMR_AF, ASN_AF, AFR_AF, EUR_AF which can be used to filter sites more frequent in one population:

bcftools view -i'EUR_AF > AMR_AF & EUR_AF > ASN_AF' file.vcf.gz


6.4 years ago
ahmedc3.ri • 0

Hi Giovanni,

My name is Ahmed, I was working on 1000 Genomes data as the base of my Master's project and had published a paper (at Genome Biology and Evolution journal )about population stratification and inference of familial relationships through genomic data. much more data i computed that wasn't published and i still have it was continental specific SNPs (African , Euro, Asian specific SNPs), if u would like to, i can share those results with you.

contact me on my email ahmedc3.ri@gmail.com if you still interested in such data.

Thanks

Ahmed