I have a data set of 608,038 autosomal SNPs from the Affymetrix Human Origins array, which has 620,744 autosomal SNPs total. These were filtered in Genotyping Console, such that they are all autosomal SNPs with 95% SNP call-rates from 93+2 samples (+2 = positive control DNA samples) (and exclude ~4000 possibly triallelic sites).
I have a downloaded a dataset with 594,924 autosomal SNPs generated using the same type of array (http://genetics.med.harvard.edu/reich/Reich_Lab/Datasets_files/EuropeFullyPublic.tar.gz).
When I merge the data sets, the total number of SNPs fall to 545,956.
This happens when I merge the data sets in Plink as well as when I look at the intersection of rs/Affx-#s from both data sets in R
first <- read.table("first.map", header=F) second <- read.table("second.map", header=F) intersect(first$V2,second$V2) -> consensus
I was wondering if anyone has any idea where these ~50,000 SNPs are going.
This is still plenty of SNPs for human evolutionary/population genetic inferences, but I don't want to be asked one day by an article reviewer why I lack this many sites from data merger.