Question: Software For Inferring Population Structure
7.1 years ago by
London, UK
Can we make a list of software for inferring Population Structure from genotype or sequencing data?

This type of software are used to infer how a set of individuals can be subdivided into groups, given their genotype or genome. A typical example is the classification of Kenyan trush samples into three different populations, described in the original paper of the Structure software. In this case, a software to infer population structure has been used to determine whether the samples collected belonged to a single populations, or to different sub-populations, and to identify outliers.

Kenyan trush example from structure paper

(Kenyan trush example from structure paper)

You can also find a lot of example in the blog of the Dodecad project, and on Dienekes's blog.

I think that Structure is the most popular software in this field, but a lot of new options have been published recently... Can you share your list of software, and give our thoughts on what is your favorite?

7.1 years ago by Giovanni M Dall'Olio26k
7.1 years ago by
Quebec City
There is also a software called ADMIXTURE.

D.H. Alexander, J. Novembre, and K. Lange. Fast model-based estimation of ancestry in unrelated individuals. Genome Research, 19:1655–1664, 2009.

I like ADMIXTURE because it is super fast for large datasets. Also check out Distruct that makes really nice plots from both STRUCTURE and admixture.

7.1 years ago by
Botond Sipos1.7k
United Kingdom
Structurama implementing the method described in:

Huelsenbeck JP, Andolfatto P. - Inference of population structure under a Dirichlet process model. Genetics. 2007 175(4):1787-802.

7.1 years ago by
Boston, MA USA
My colleague has done some extensive work on this with a Puerto Rican population. Individuals originated from three ancestral populations: European settlers, native Taíno Indians, and West Africans. He used two programs: STRUCTURE 2.2 (Falush et al. 2003; Pritchard et al. 2000) and IAE3CI (Tsai et al. 2005; Parra et al. 2001), then the EIGENSTRAT (Price et al. 2006) program was implemented in HelixTree (Golden Helix, Bozeman, MT, USA) to calculate the principal components based on the genotypes of 100 ancestry informative markers in the population.

Added in edit 27 Jan 2012: Keep in mind that these calculations (of admixture) work best when one knows the frequencies of informative markers in each of the ancestral populations.

Falush D, Stephens M, Pritchard JK (2003) Inference of population structure using multilocus genotype data: linked loci and correlated allele frequencies. Genetics 164:1567–1587.[?] Parra EJ, Kittles RA, Argyropoulos G, et al. (2001) Ancestral proportions and admixture dynamics in geographically defined African Americans living in South Carolina. Am J Phys Anthropol 114:18–29.[?] Price AL, Patterson NJ, Plenge RM, et al. (2006) Principal components analysis corrects for stratification in genome-wide association studies. Nat Genet 38:904–909.[?] Pritchard JK, Stephens M, Donnelly P (2000) Inference of population structure using multilocus genotype data. Genetics 155:945–959.[?] Tsai HJ, Choudhry S, Naqvi M, et al. (2005) Comparison of three methods to estimate genetic ancestry and control for stratification in genetic association studies among admixed populations. Hum Genet 118:424–433.[?]

7.1 years ago by
I have used eBURST in the past which can be quite useful although it takes only takes MLST data. it was designed to visualise population structure in bacteria - definately worth a look:

alt text

