CAGm: a repository of germline microsatellite variations in the 1000 genomes project
We present the Comparative Analysis of Germline Microsatellites (CAGm): a database of germline microsatellites from 2529 individuals in the 1000 genomes project. A key novelty of CAGm is the ability to aggregate microsatellite variation by population, ethnicity (super population) and gender. The database provides advanced searching for microsatellites embedded in genes and functional elements. All data can be downloaded as Microsoft Excel spreadsheets. Two use-case scenarios are presented to demonstrate its utility: a mononucleotide (A) microsatellite at the BAT-26 locus and a dinucleotide (CA) microsatellite in the coding region of FGFRL1. CAGm is freely available at http://www.cagmdb.org/.
The CAGm database contains information on 625,178 microsatellites across 2,529 individuals in the 1,000 genomes project. These individuals in turn come from 26 worldwide populations belonging to 5 ethnicities (super populations). The database contains 31,645,227 microsatellites genotypes: 12,513 genotypes on average for each sample. Genotypes are easily filtered against their statistical likelihood. Access to the 1,560,636,846 next generation sequencing reads—used for microsatellite genotyping—provides a source of validation and further analysis.
Nicholas Kinney, Kyle Titus-Glover, Jonathan D Wren, Robin T Varghese, Pawel Michalak, Han Liao, Ramu Anandakrishnan, Arichanah Pulenthiran, Lin Kang, Harold R Garner; CAGm: a repository of germline microsatellite variations in the 1000 genomes project, Nucleic Acids Research, , gky969, https://doi.org/10.1093/nar/gky969