In teaching an undergraduate bioinformatics module, I've been contemplating how to engage the students using examples of historic milestones in the application of sequence alignment. I have a few good examples in mind, but I'd be interested in recommendations and suggestions. But here's what I don't want: Examples that show off algorithmic cleverness or bioinformatics computing power as such. These are life sciences students, and I want to provide examples of real biological advances that have utilized sequence alignment.
I can offer a great example: Discovery of the APOA5 gene encoding an important apolipoprotein involved in cholesterol and triglyceride homeostasis. This gene was discovered by aligning the human and mouse genomic seqs and noticing peak regions of higher than expected similarity. These turned out to be the APOA5 exons. This work is elegantly described by Pennacchio et al in 2001 in which the gene's role in triglyceride (TG) homeostasis is elucidated. It has been shown subsequently in numerous populations the world over that variation in the human APOA5 gene leads to differential TG levels. In some populations, the SNP-TG association is modified by intake of certain dietary fats. In other words, the risk allele is not really risk until the diet contains too much or too little of a certain component.
It is rare that such an important gene was unknown (not simply ill described, but completely unknown) prior to 2001. Furthermore, this is a nice concrete example of how such was discovered by a simple alignment of genomic sequences and is the basis for discovery of regulatory elements (along the lines of ENCODE, Jim Noonan's excellent work and Kate Pollard's HARs).
There is a pretty famous story from the early 1990s about yeast proteins (like RAD51) that are involved in recombination and DNA repair showing striking similarity to bacterial RecA proteins, providing evidence that these processes share a common origin across eukaryotes and prokaryotes. See for example (there are others that came out around the same times): http://www.ncbi.nlm.nih.gov/pubmed/1581961
I remember this being told by Doug Bishop in graduate school that this was one of the first examples of database searches/sequence alignment successfully finding a common biological process across eukaryotes and prokaryotes, and that sequence similarity really drove the biological discovery.
A counterexample might be the sequence of HIV, what Robert Gallo calls in 1985 "HTLV-III" compared with HTLV-I
Notice how tenuous the alignments are to HTLV-I, even among conserved proteins. We know today HIV has nothing much to do with HTLV-I other than both being retroviruses that infect humans.
You can really feel Gallo's incredible force of will shoved down the throat of reality.
Complete nucleotide sequence of the AIDS virus, HTLV-III.
The original alignments published by
Needleman & wunsch : (1970) http://genome.crg.es/seminars/Alineator/papers/needleman70.pdf
Smith and Waterman (1981): http://ibi.zju.edu.cn/bioinplant/courses/smithandwaterman1981.pdf
The structure of a RNA viroid. for example try to process the following sequence:
>gi|341870818|gb|HQ891019.1| Chrysanthemum stunt viroid isolate H5-2, complete genome CGGGACTTACTTGTGGTTCCTGTGGTGCACTCCTGACCCTGCTGCTTTGAAAGAAAAAGAAATGAGGCGA AGAAGTCCTTCAGGGATCCCCGGGGAAACCTGGAGGAAGTCCGACGAGATCGCGGCTGGGGCTTAGGACC CCACTCCTGCGAGACAGGAGTAATCCTAAACAGGGTTTTCACCCTTCCTTTAGTTTCCTTCCTCTCCTGG AGAGGTCTTCTGCCCTAGCCCGGTCTTCGAAGCTTCCTTTGGCTACTACCCGGTGGAAACAACTGAAGCT TCAACGCCTTTTTTTCCAATCTTCTTTAGCACCGGGCTAGGGAGTAAGCCCGTGGAACCTTAGTTTTGTT CCCT
Instead of looking gene by gene for nice examples, you can also offer the example of aligning whole genomes - as was done separately for several yeast and Drosophila species in order to yes, detect new genes and regulatory elements, but more importantly to describe speciation and the degree to which the different species have diverged from one another. This has been extended recently by Paabo's group in aligning the human and Neandertal genomes and identifying that non-Africans have ~4% Neandertal DNA. This is all accomplished with genome-wide alignments.
Perhaps this one (SNPs resulting in premature STOP codons.):