4.7 years ago by
Two important applications of genome alignments are:
Find conserved non-coding sequence (CNS), these are shared regulatory elements that have important functions. See VISTA's enhancer database, and pay attention to how they use the genome alignments to extract candidates.
Find conserved synteny and genome rearrangements. Shared synteny (or disruption, called breakpoints) can be used as phylogenetic signals to sort out species relationship.
It is not trivial to build genome alignments accurately, and often require two core steps - generating anchors and chain anchors to form large unambiguous synteny blocks. For example, in BLASTZ/CHAIN/NET pipeline, BLASTZ generates anchors, CHAIN/NET groups them; in LAGAN/SUPERMAP pipeline, LAGAN generates anchors, SUPERMAP groups them.
If you work with vertebrates, there is no need to repeat the exercise yourself. UCSC genome browser offer downloads for pre-built alignments, in MAF format.
There are also quite a few graphical tool, for example MAUVE (often used in prokaryotes). There is also a web-based system called CoGe - they have 10,000+ genomes updated weekly - so you can just pick two genomes and align using their SynMap pipeline, which also creates genomic dot plots for you. It takes some learning, but definitely worth it.
To your last point, why this is better than some local alignments. Well, they are the same animal, if you know which sequences to align. So think of genome alignments as BLAST (find similar sequences) + CLUSTALW (align). Most pipeline also has built-in rules to make sure you are more likely to find orthologous sequences.
modified 4.7 years ago
4.7 years ago by
Haibao Tang ♦ 2.8k