I have been wondering how ancestry information is obtained. For example, 23andme & ancestry.com both offer ancestry composition analysis, which tells you how much of you is English, how much is German, etc. Also, there are companies doing ancestry analysis for dogs, which tells you your dog is 10% golden retriever, 30% beagle, ...
Technically, how is this done?
One algorithm I can think of is to
1) find out the specific sequence features (e.g. SNPs) which belong to each of the human ethnic groups or dog breeds
2) Use likelihood ratio to calculate the probability of the tested sample belong to each ethnic group or breed
3) normalize the likelihood to get the ancestry composition of the tested sample
Another algorithm I can think of is to
1) get the Eigen Vectors of the genome-wide SNPs
2) find the EVs that separate the ethnic groups or breeds
3) calculate the distance of the tested sample to each of the ethnic group/breed clusters
4) use distance to determine the ancestry composition of the sample
There are my heuristic ways of decoding the ancestry composition. Bet there are smarter ways.
Another technical question is where to find the genomic data for the ethnic groups? Hapmap/1000 Genome does not have resolution high enough to separate English, German, Korean, Chinese, etc.? Where can we find such genomic data of dogs, cats, mice, etc.?