Sorry for yet another basic question, but what exactly is a pairwise genome alignment between 2 organisms e.g. human/chicken.
I remember the difference between algorithms for global/local alignments but I don't remember using genome alignments. I looked for a definition online and in several books.
Is it just a file of all of the areas that align between 2 genomes? Are they publicly available or do you have to prepare them yourself? Are they 'redone' between different genome assemblies?
What would be the benefit of aligning the whole genome in this way (if that is indeed the correct interpretation) rather than creating alignments for your areas of interest
thanks in advance
Genome alignment obviously makes most sense for organisms that are more closely-related. Human-chicken might be interesting, Human-Nanoarchaeum is not.
Genome alignment algorithms are often described as glocal; that is, they try to maximize local alignment whilst trying to include the start/end of one of the pairs. There is quite a good Wikipedia page on sequence alignment, if you need a simple guide or clarification.
Two important applications of genome alignments are:
Find conserved non-coding sequence (CNS), these are shared regulatory elements that have important functions. See VISTA's enhancer database, and pay attention to how they use the genome alignments to extract candidates.
Find conserved synteny and genome rearrangements. Shared synteny (or disruption, called breakpoints) can be used as phylogenetic signals to sort out species relationship.
It is not trivial to build genome alignments accurately, and often require two core steps - generating anchors and chain anchors to form large unambiguous synteny blocks. For example, in BLASTZ/CHAIN/NET pipeline, BLASTZ generates anchors, CHAIN/NET groups them; in LAGAN/SUPERMAP pipeline, LAGAN generates anchors, SUPERMAP groups them.
If you work with vertebrates, there is no need to repeat the exercise yourself. UCSC genome browser offer downloads for pre-built alignments, in MAF format.
There are also quite a few graphical tool, for example MAUVE (often used in prokaryotes). There is also a web-based system called CoGe - they have 10,000+ genomes updated weekly - so you can just pick two genomes and align using their SynMap pipeline, which also creates genomic dot plots for you. It takes some learning, but definitely worth it.
To your last point, why this is better than some local alignments. Well, they are the same animal, if you know which sequences to align. So think of genome alignments as BLAST (find similar sequences) + CLUSTALW (align). Most pipeline also has built-in rules to make sure you are more likely to find orthologous sequences.
People at UCSC have been using LASTZ. See:
Here is a list of programs:
Also read: "Parameters for accurate genome alignment" Frith et al. BMC Bioinformatics. 2010; 11: 80. http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2829014/
Allow me to extend Haibao Tang's response. An alignment of the two genomes will give the blocks of synteny, of conserved gene order. Knowing that number can give an idea of how distant the genomes are - not necessarily in years but in terms of genome rearrangement and overall organization. Furthermore, one can look within a block and see what rearrangments took place afterward (after the two organisms diverged from a common ancestor). Was their an expansion or contraction of this or that gene family since that time? That's a pretty fundamental question in terms of evolution and divergence.
For many of these comparisons, the repeats in the genome fall out and do not enter the alignment. This is because many repeats are rather species-specific - human Alu elements do not align well to mouse B elements, e.g. A comparison of human and chimpanzee genomes will show where, maybe when Alu expansion occurred as Alus are more conserved here.
If you look for reliable alignments of prokaryotic genomes there's quite old but nevertheless very useful ATGC (Alignable Tight Genomic Clusters) database http://atgc.lbl.gov/atgc/ . It is optimized for research on microevolution, as it removes much of the variability (rearrangements, recombinations, etc.) from the alignments.