What Would You Recommend For Great Examples Of Sequence Alignment In Biology?
6
12
Entering edit mode
11.4 years ago

In teaching an undergraduate bioinformatics module, I've been contemplating how to engage the students using examples of historic milestones in the application of sequence alignment. I have a few good examples in mind, but I'd be interested in recommendations and suggestions. But here's what I don't want: Examples that show off algorithmic cleverness or bioinformatics computing power as such. These are life sciences students, and I want to provide examples of real biological advances that have utilized sequence alignment.

sequence alignment • 3.4k views
11
Entering edit mode
11.4 years ago

I can offer a great example: Discovery of the APOA5 gene encoding an important apolipoprotein involved in cholesterol and triglyceride homeostasis. This gene was discovered by aligning the human and mouse genomic seqs and noticing peak regions of higher than expected similarity. These turned out to be the APOA5 exons. This work is elegantly described by Pennacchio et al in 2001 in which the gene's role in triglyceride (TG) homeostasis is elucidated. It has been shown subsequently in numerous populations the world over that variation in the human APOA5 gene leads to differential TG levels. In some populations, the SNP-TG association is modified by intake of certain dietary fats. In other words, the risk allele is not really risk until the diet contains too much or too little of a certain component.

It is rare that such an important gene was unknown (not simply ill described, but completely unknown) prior to 2001. Furthermore, this is a nice concrete example of how such was discovered by a simple alignment of genomic sequences and is the basis for discovery of regulatory elements (along the lines of ENCODE, Jim Noonan's excellent work and Kate Pollard's HARs).

0
Entering edit mode

This is a very nice example, exactly along the lines I was looking for! Thanks!

7
Entering edit mode
11.4 years ago

There is a pretty famous story from the early 1990s about yeast proteins (like RAD51) that are involved in recombination and DNA repair showing striking similarity to bacterial RecA proteins, providing evidence that these processes share a common origin across eukaryotes and prokaryotes. See for example (there are others that came out around the same times): http://www.ncbi.nlm.nih.gov/pubmed/1581961

I remember this being told by Doug Bishop in graduate school that this was one of the first examples of database searches/sequence alignment successfully finding a common biological process across eukaryotes and prokaryotes, and that sequence similarity really drove the biological discovery.

1
Entering edit mode

I like that. I was thinking about using the first publications (ca 1984-1985) of inferred proto-oncogene activation of a receptor tyrosine kinase. I think these were the first sequence alignments published in Science

0
Entering edit mode

Some of the first alignments go back to Margaret Dayhoff and are far earlier than 1984/85.

0
Entering edit mode

@Larry, right alignment itself clearly goes back further, but I think Dayhoff's alignments all assumed homology and didn't generate new hypotheses about common biological processes.

6
Entering edit mode
11.4 years ago

A counterexample might be the sequence of HIV, what Robert Gallo calls in 1985 "HTLV-III" compared with HTLV-I

Notice how tenuous the alignments are to HTLV-I, even among conserved proteins. We know today HIV has nothing much to do with HTLV-I other than both being retroviruses that infect humans.

You can really feel Gallo's incredible force of will shoved down the throat of reality.

Complete nucleotide sequence of the AIDS virus, HTLV-III.

http://www.ncbi.nlm.nih.gov/pubmed/2578615

http://www.nature.com/nature/journal/v313/n6000/pdf/313277a0.pdf

1
Entering edit mode

+1 for advocating for learning how not to do science.

0
Entering edit mode

Yes, a nice example +1. One could even carry this to something contemporary like the E. coli outbreak in Germany this spring/summer and the alignments done to identify the source strains and, more interestingly, how those came together to produce something so deadly.

0
Entering edit mode

Terrific. I wish I could put two answers as the best answers to this question!

2
Entering edit mode
11.4 years ago

Needleman & wunsch : (1970) http://genome.crg.es/seminars/Alineator/papers/needleman70.pdf

Smith and Waterman (1981): http://ibi.zju.edu.cn/bioinplant/courses/smithandwaterman1981.pdf

The structure of a RNA viroid. for example try to process the following sequence:

>gi|341870818|gb|HQ891019.1| Chrysanthemum stunt viroid isolate H5-2, complete genome
CGGGACTTACTTGTGGTTCCTGTGGTGCACTCCTGACCCTGCTGCTTTGAAAGAAAAAGAAATGAGGCGA
AGAAGTCCTTCAGGGATCCCCGGGGAAACCTGGAGGAAGTCCGACGAGATCGCGGCTGGGGCTTAGGACC
CCACTCCTGCGAGACAGGAGTAATCCTAAACAGGGTTTTCACCCTTCCTTTAGTTTCCTTCCTCTCCTGG
AGAGGTCTTCTGCCCTAGCCCGGTCTTCGAAGCTTCCTTTGGCTACTACCCGGTGGAAACAACTGAAGCT
TCAACGCCTTTTTTTCCAATCTTCTTTAGCACCGGGCTAGGGAGTAAGCCCGTGGAACCTTAGTTTTGTT
CCCT

0
Entering edit mode

Pierre, those are all interesting in their own right, but not really what I am looking for. The publications of the N&W and S&W algorithms were certainly milestones, but in my view the actual alignments are not. The alignments in those original papers serve to demonstrate features of the algorithms. There is some interesting discussion of the alignments in N&W, but I still wouldn't consider them biological milestones. RNA folding is also interesting, but brings in a lot of issues apart from alignment (e.g., folding topology, free energy calculations, etc)

2
Entering edit mode
11.4 years ago

Instead of looking gene by gene for nice examples, you can also offer the example of aligning whole genomes - as was done separately for several yeast and Drosophila species in order to yes, detect new genes and regulatory elements, but more importantly to describe speciation and the degree to which the different species have diverged from one another. This has been extended recently by Paabo's group in aligning the human and Neandertal genomes and identifying that non-Africans have ~4% Neandertal DNA. This is all accomplished with genome-wide alignments.

1
Entering edit mode

I understand. However, if you show the human-mouse comparison over the APOA5-APOA4-APOC3-APOA1 (60 kbp) gene region, you will see peaks of different heights, meaning diff. levels of conservation. That can allow you to touch on (w.out going into details) evolution and rates of change and so forth.

0
Entering edit mode

Given the way the course is currently structured, that would be better later on. These are second year university students, so I need to make the examples relevant to the their backgrounds (which, of course, vary).

0
Entering edit mode

Very nice segue :-)

1
Entering edit mode
11.4 years ago
jli99 ▴ 150

Perhaps this one (SNPs resulting in premature STOP codons.):

http://genesdev.cshlp.org/content/25/1/1/F3.expansion.html