Question: Pairwise Genome Alignment
gravatar for Andrea_Bio
3.5 years ago by
Andrea_Bio2.1k wrote:


Sorry for yet another basic question, but what exactly is a pairwise genome alignment between 2 organisms e.g. human/chicken.

I remember the difference between algorithms for global/local alignments but I don't remember using genome alignments. I looked for a definition online and in several books.

Is it just a file of all of the areas that align between 2 genomes? Are they publicly available or do you have to prepare them yourself? Are they 'redone' between different genome assemblies?

What would be the benefit of aligning the whole genome in this way (if that is indeed the correct interpretation) rather than creating alignments for your areas of interest

thanks in advance

ADD COMMENTlink modified 2.6 years ago by Larry_Parnell15k • written 3.5 years ago by Andrea_Bio2.1k

Consider one thing: the human genome has been generated with BAC clones sequencing, while the chicken genome is probably done by shotgun sequencing. This means that in the chicken genome, all the duplicated regions will be clustered together, and there it will be more holes in the most repetitive regions. So a genome-vs-genome alignment can lead to some artifacts due to the fact that the two genomes have been sequenced with different techniques.

ADD REPLYlink written 3.5 years ago by Giovanni M Dall'Olio18k
gravatar for Haibao Tang
3.5 years ago by
Haibao Tang2.7k
Rockville, MD
Haibao Tang2.7k wrote:

Two important applications of genome alignments are:

  • Find conserved non-coding sequence (CNS), these are shared regulatory elements that have important functions. See VISTA's enhancer database, and pay attention to how they use the genome alignments to extract candidates.

  • Find conserved synteny and genome rearrangements. Shared synteny (or disruption, called breakpoints) can be used as phylogenetic signals to sort out species relationship.

It is not trivial to build genome alignments accurately, and often require two core steps - generating anchors and chain anchors to form large unambiguous synteny blocks. For example, in BLASTZ/CHAIN/NET pipeline, BLASTZ generates anchors, CHAIN/NET groups them; in LAGAN/SUPERMAP pipeline, LAGAN generates anchors, SUPERMAP groups them.

If you work with vertebrates, there is no need to repeat the exercise yourself. UCSC genome browser offer downloads for pre-built alignments, in MAF format.

There are also quite a few graphical tool, for example MAUVE (often used in prokaryotes). There is also a web-based system called CoGe - they have 10,000+ genomes updated weekly - so you can just pick two genomes and align using their SynMap pipeline, which also creates genomic dot plots for you. It takes some learning, but definitely worth it.

To your last point, why this is better than some local alignments. Well, they are the same animal, if you know which sequences to align. So think of genome alignments as BLAST (find similar sequences) + CLUSTALW (align). Most pipeline also has built-in rules to make sure you are more likely to find orthologous sequences.

ADD COMMENTlink modified 3.5 years ago • written 3.5 years ago by Haibao Tang2.7k

great answer, many thanks

ADD REPLYlink written 3.4 years ago by Andrea_Bio2.1k
gravatar for Neilfws
3.5 years ago by
Sydney, Australia
Neilfws41k wrote:
  1. Yes, a pairwise genome alignment is essentially a file of two aligned genomes.
  2. Some are available, e.g. VISTA genome alignments. There are also plenty of software tools available to do it yourself. A popular tool is MUMmer; another is LAGAN.
  3. Are they re-done as genome assemblies are revised? That would depend on whoever maintains the data. Hopefully, and in the best cases, yes they are.
  4. The benefit is that you can visualize large "blocks" of genome structure. These may include synteny (see this resource on yeast genome synteny) or large-scale rearrangements: duplication, deletion, inversion.

Genome alignment obviously makes most sense for organisms that are more closely-related. Human-chicken might be interesting, Human-Nanoarchaeum is not.

Genome alignment algorithms are often described as glocal; that is, they try to maximize local alignment whilst trying to include the start/end of one of the pairs. There is quite a good Wikipedia page on sequence alignment, if you need a simple guide or clarification.

ADD COMMENTlink modified 3.5 years ago • written 3.5 years ago by Neilfws41k

thank you for a thorough answer

ADD REPLYlink written 3.4 years ago by Andrea_Bio2.1k

Do you take one organism as the 'reference' and then align the other organism to it because naturally there won't be a one-to-one correspondence between chromosomes in the 2 organisms

ADD REPLYlink written 3.4 years ago by Andrea_Bio2.1k

You'd align pairs of chromosomes.

ADD REPLYlink written 3.4 years ago by Neilfws41k
gravatar for Pierre Lindenbaum
3.5 years ago by
Pierre Lindenbaum58k wrote:

People at UCSC have been using LASTZ. See:

ADD COMMENTlink written 3.5 years ago by Pierre Lindenbaum58k

LASTZ is new to me. Thanks for this one.

ADD REPLYlink written 3.4 years ago by Khader Shameer14k
gravatar for Darked89
3.4 years ago by
Barcelona, Spain
Darked893.5k wrote:

Here is a list of programs:

Also read: "Parameters for accurate genome alignment" Frith et al. BMC Bioinformatics. 2010; 11: 80.

ADD COMMENTlink written 3.4 years ago by Darked893.5k
gravatar for Larry_Parnell
3.4 years ago by
Boston, MA USA
Larry_Parnell15k wrote:

Allow me to extend Haibao Tang's response. An alignment of the two genomes will give the blocks of synteny, of conserved gene order. Knowing that number can give an idea of how distant the genomes are - not necessarily in years but in terms of genome rearrangement and overall organization. Furthermore, one can look within a block and see what rearrangments took place afterward (after the two organisms diverged from a common ancestor). Was their an expansion or contraction of this or that gene family since that time? That's a pretty fundamental question in terms of evolution and divergence.

For many of these comparisons, the repeats in the genome fall out and do not enter the alignment. This is because many repeats are rather species-specific - human Alu elements do not align well to mouse B elements, e.g. A comparison of human and chimpanzee genomes will show where, maybe when Alu expansion occurred as Alus are more conserved here.

ADD COMMENTlink modified 3.4 years ago • written 3.4 years ago by Larry_Parnell15k
gravatar for Pawel Szczesny
3.4 years ago by
Pawel Szczesny2.6k
Pawel Szczesny2.6k wrote:

If you look for reliable alignments of prokaryotic genomes there's quite old but nevertheless very useful ATGC (Alignable Tight Genomic Clusters) database . It is optimized for research on microevolution, as it removes much of the variability (rearrangements, recombinations, etc.) from the alignments.

ADD COMMENTlink written 3.4 years ago by Pawel Szczesny2.6k
Please log in to add an answer.

  • RSS
  • Stats
  • API

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.0.0
Traffic: 370 users visited in the last hour