So far, I was always convinced that these files contain only SNPs that are present in all the populations. For example, I thought that SNPs present only in the European population ("private" to Europeans) would be filtered out from this dataset.
The problem is that I can't find any reference or README file confirming me that the VCF files in 1000 Genomes refer only to 'cosmopolitan' SNPs. Can anyone please point me to a reference or documentation file?
There is no filter to remove population specific SNPs, that's why you can't find a reference to it.
It is rare to find, for example, SNPs at medium frequency in Europe and absent from all other populations.
Singleton SNPs are by definition also population specific, but in the process of trying to reduce FDR, we have lower power to detect these.
Figure 3 on the Phase1 paper discusses f2 variants (which occur on 2 chromosomes in the whole 1092 samples),
and you can see how often these SNPs are shared between two populations.
My name is Ahmed, I was working on 1000 Genomes data as the base of my Master's project and had published a paper (at Genome Biology and Evolution journal) about population stratification and inference of familial relationships through genomic data. much more data I computed that wasn't published and I still have it was continental specific SNPs (African , Euro, Asian specific SNPs), if u would like to, I can share those results with you.
contact me on my email firstname.lastname@example.org if you still interested in such data.