Question: What Are The Classic Papers In Bioinformatics?
31
gravatar for Casey Bergman
3.5 years ago by
Casey Bergman14k
Manchester, UK
Casey Bergman14k wrote:

A few years back, I asked a dozen or so colleagues for classic/important papers that every bioinformatician should read as a part of their training. I thought BioStar might be a good place to resuscitate this exercise to get a broader set of candidates and let the community weigh in on what papers make up the bioinformatics "canon".

Here are some of the papers that I use for teaching to start the ball rolling:

Altschul et al. Basic local alignment search tool. J Mol Biol. 1990 Oct 5;215(3):403-10. http://www.ncbi.nlm.nih.gov/pubmed/2231712

Myers et al. A whole-genome assembly of Drosophila. Science. 2000 Mar 24;287(5461):2196-204. http://www.ncbi.nlm.nih.gov/pubmed/10731133

Burge & Karlin. Prediction of complete gene structures in human genomic DNA. J Mol Biol. 1997 Apr 25;268(1):78-94. http://www.ncbi.nlm.nih.gov/pubmed/9149143

Lowe & Eddy. tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence. Nucleic Acids Res. 1997 Mar 1;25(5):955-64. http://www.ncbi.nlm.nih.gov/pubmed/9023104

Depending on the level of interest in this topic, perhaps we can put together a library on citeulike of "bioinformatics classics"

ADD COMMENTlink written 3.5 years ago by Casey Bergman14k
16
gravatar for Mary
3.5 years ago by
Mary9.2k
Boston MA area
Mary9.2k wrote:

Oh, I did a blog post on one once. It was part of a "classic papers" blogging initiative that was really fun, actually.

Margaret Dayhoff, a founder of the field of bioinformatics

In it I think I found the first computational protein analysis:

In this paper we shall describe a completed computer program for the IBM 7090, which to our knowledge is the first successful attempt at aiding the analysis of the amino acid chain structure of protein.

The program was called COMPROTEIN (yes, it was all caps). But it was in fact a pipeline of several programs: MAXLAP, MERGE, PEPT , SEARCH, QLIST, and LOGRED.

Reference: Dayhoff, M. O. and R. S. Ledley. Comprotein: A Computer Program to Aid Primary Protein Structure Determination. In Proceedings of the Fall Joint Computer Conference, 1962, 262-274. Santa Monica, CA: American Federation of Information Processing Societies, 1962. http://doi.acm.org/10.1145/1461518.1461546

The link is now broken though, I'll have to find out where it is now.

This link seems to work: http://portal.acm.org/citation.cfm?id=1461546

ADD COMMENTlink modified 3.5 years ago • written 3.5 years ago by Mary9.2k
13
gravatar for Pierre Lindenbaum
3.5 years ago by
France
Pierre Lindenbaum58k wrote:

Nobody cited the Smith & Waterman algorithm ?

JMB 1981: Identification of common molecular subsequences T. F. Smith and M. S. Waterman http://dx.doi.org/10.1016/0022-2836(81)90087-5

and Needleman–Wunsch:

Needleman, Saul B.; and Wunsch, Christian D. (1970). "A general method applicable to the search for similarities in the amino acid sequence of two proteins". Journal of Molecular Biology 48 (3): 443–53. doi:10.1016/0022-2836(70)90057-4. PMID 5420325.

ADD COMMENTlink written 3.5 years ago by Pierre Lindenbaum58k
11
gravatar for Simon Cockell
3.5 years ago by
Simon Cockell6.6k
Newcastle
Simon Cockell6.6k wrote:

I wouldn't normally answer a question twice, but these are unrelated to my first answer.

Important papers to me personally:

Chothia C, Lesk AM. (1986) The relation between the divergence of sequence and structure in proteins. EMBO J. 1986 Apr;5(4):823-6. http://www.ncbi.nlm.nih.gov/pubmed/3709526

Paving the way for homology modelling.

Perkins DN, Pappin DJ, Creasy DM, Cottrell JS. (1999) Probability-based protein identification by searching sequence databases using mass spectrometry data. Electrophoresis. 1999 Dec;20(18):3551-67. http://www.ncbi.nlm.nih.gov/pubmed/10612281

The paper that outlined MASCOT, as important as BLAST for proteomics (though SEQUEST came earlier - Eng et al. (1994) J Am Soc Mass Spectrom 5: 976–989. doi:10.1016/1044-0305(94)80016-2).

ADD COMMENTlink modified 3.5 years ago • written 3.5 years ago by Simon Cockell6.6k
8
gravatar for Simon Cockell
3.5 years ago by
Simon Cockell6.6k
Newcastle
Simon Cockell6.6k wrote:

Stajich JE, Block D, Boulez K, Brenner SE, Chervitz SA, Dagdigian C, Fuellen G, Gilbert JG, Korf I, Lapp H, Lehväslaiho H, Matsalla C, Mungall CJ, Osborne BI, Pocock MR, Schattner P, Senger M, Stein LD, Stupka E, Wilkinson MD, Birney E. The Bioperl toolkit: Perl modules for the life sciences. Genome Res. 2002 Oct;12(10):1611-8. http://www.ncbi.nlm.nih.gov/pubmed/12368254

Other Bio* library papers are available, but I think most would agree, BioPerl is the most "important".

ADD COMMENTlink written 3.5 years ago by Simon Cockell6.6k
7
gravatar for Giovanni M Dall'Olio
3.5 years ago by
London, UK
Giovanni M Dall'Olio18k wrote:

Maybe the paper on the 1000 genomes published yesterday will open a new era in bioinformatics.

This morning I attended a talk from one of the authors, and he explained some of the challenges that have been faced by the 1000 genomes consortium. For the first time in history, the biggest datasets in biology are reaching the levels of the datasets in physics and astronomy. From now on, we will have to think more carefully about the tools we use: for example, physicists have developed an alternative to Internet to share data, while we biologists are still using the http or ftp protocol to download data, competing with people downloading mp3s. We need to look for alternatives to download Gigabytes of new data produced daily, like shared cloud computing images for example. Moreover, the 1000 genomes project has also presented many new formats like BAM and SAM, and new tools to handle huge datasets.

ADD COMMENTlink modified 3.5 years ago • written 3.5 years ago by Giovanni M Dall'Olio18k
7
gravatar for Israel Barrantes
2.4 years ago by
Magdeburg, Germany
Israel Barrantes320 wrote:

Usually, these papers were classified into the bionformatics' fields of research during the 1990s, i.e., gene prediction (genscan, glimmer, etc), alignment (blast, Smith-Waterman, Needleman-Wunsch, etc), protein structure prediction (Chou-Fasman, etc), and phylogenetics (phylip, etc).

Here's a short list of alignment- related articles, in addition to the already listed Smith-Waterman and Needleman-Wunsch papers:

  • Wilson, A.C., Carlson, S.S., White, T.J. (1977) "Biochemical evolution." Ann. Rev. Biochem. 46:573-639.
  • Doolittle, R.F. (1981) "Similar amino acid sequences: chance or common ancestry?" Science 214:149-159.
  • Henikoff, S., Henikoff, J.G. (1992) "Amino acid substitution matrices from protein blocks." Proc. Natl. Acad. Sci. USA 89:10915-10919.
  • Gotoh, O. (1982) "An improved algorithm for matching biological sequences." J. Mol. Biol. 162:705-708.
  • Fitch, W.M., Smith, T.F. (1983) "Optimal sequence alignments." Proc. Natl. Acad. Sci. USA 80:1382-1386.
  • Pearson, W.R., Lipman, D.J. (1988) "Improved tools for biological sequence comparison." Proc. Natl. Acad. Sci. USA 85:2444-2448.
  • Altschul, S.F., Gish, W., Miller, W., Myers, E.W., Lipman, D.J. (1990) "Basic local alignment search tool." J. Mol. Biol. 215:403-410.
  • Gish, W., States, D.J. (1993) "Identification of protein coding regions by database similarity search." Nature Genet. 3:266-272.
  • Altschul, S.F., Madden, T.L., Schaffer, A.A., Zhang, J., Zhang, Z., Miller, W. & Lipman, D.J. (1997) "Gapped BLAST and PSI-BLAST: a new generation of protein database search programs." Nucleic Acids Res. 25:3389-3402.
  • Henikoff, S., Henikoff, J.G. (1994) "Position-based sequence weights." J. Mol. Biol. 243:574-578.
  • Lipman, D.J., Altschul, S.F., Kececioglu, J.D. (1989) "A tool for multiple sequence alignment." Proc. Natl. Acad. Sci. USA 86:4412-4415.
  • Thompson, J.D., Higgins, D.G., Gibson, T.J. (1994) "CLUSTAL W: Improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice." Nucleic Acids Res. 22:4673-4680.
  • Staden, R. (1989) "Methods for discovering novel motifs in nucleic acid sequences." Comput. Appl. Biosci. 5:293-298.
  • Stormo, G.D., Hartzell, G.W. III (1989) "Identifying protein-binding sites from unaligned DNA fragments." Proc. Natl. Acad. Sci. USA 86:1183-1187.
  • Schuler, G.D., Altschul, S.F., Lipman, D.J. (1991) "A workbench for multiple alignment construction and analysis." Proteins 9:180-190.
  • Karlin, S., Altschul, S.F. (1990) "Methods for assessing the statistical significance of molecular sequence features by using general scoring schemes." Proc. Natl. Acad. Sci. USA 87:2264-2268.

Besides, the famous articles from Margaret Dayhoff about substitution matrices:

Dayhoff, M.O., Schwartz, R.M., Orcutt, B.C. (1978) "A model of evolutionary change in proteins." In "Atlas of Protein Sequence and Structure, vol. 5, suppl. 3," M.O. Dayhoff (ed.), pp. 345-352, Natl. Biomed. Res. Found., Washington, DC.

Schwartz, R.M., Dayhoff, M.O. (1978) "Matrices for detecting distant relationships." In "Atlas of Protein Sequence and Structure, vol. 5, suppl. 3," M.O. Dayhoff (ed.), pp. 353-358, Natl. Biomed. Res. Found., Washington, DC.

ADD COMMENTlink modified 2.4 years ago • written 2.4 years ago by Israel Barrantes320

@Israel - wow! It'll be hard to beat this one so the green tick mark goes to you.

ADD REPLYlink written 2.4 years ago by Casey Bergman14k
6
gravatar for Konrad
3.5 years ago by
Konrad630
Konrad630 wrote:

I would also add the first COG paper to the list:

It offers interesting evolutionary insights and the concept of COG is a quite helpful tool - personally speaking.

ADD COMMENTlink written 3.5 years ago by Konrad630
6
gravatar for Giovanni M Dall'Olio
3.2 years ago by
London, UK
Giovanni M Dall'Olio18k wrote:

PLoS COmputational Biology has recently launched a series of Perspectives called 'The roots of bioinformatics', to illustrate the seminal papers in each of the sub-fields in bioinformatics.

To date, only two articles of the series have been published:

  • Searls DB. The roots of bioinformatics. PLoS Comput Biol. 2010 Jun . Doolittle RF. The roots of bioinformatics in protein evolution. PLoS Comput Biol. 2010 Jul 29;6(7):e1000875. Review. PubMed PMID: 20686682;

  • Doolittle RF. The roots of bioinformatics in protein evolution. PLoS Comput Biol. 2010 Jul 29;6(7):e1000875. Review. PubMed PMID: 20686682; PubMed Central PMCID: PMC2912333.

If you are interested, you can create a citation alert for '"roots of bioinformatics" Plos Computational Biology in Entrez.

ADD COMMENTlink written 3.2 years ago by Giovanni M Dall'Olio18k
4
gravatar for Larry_Parnell
3.2 years ago by
Larry_Parnell15k
Boston, MA USA
Larry_Parnell15k wrote:

The review by David Searles in June, 2010 in PLoS Computational Biology on the roots of bioinformatics will certainly point you to some classic papers, including some you likely never thought of as belonging to this field. This review was very well written and was a joy to read. The paper is here.

I would also add the early papers of JW Fickett on gene modeling based on base composition and comparative approaches.

ADD COMMENTlink written 3.2 years ago by Larry_Parnell15k
3
gravatar for Pavid
3.5 years ago by
Pavid160
Pavid160 wrote:

Hey!

Interesting question! I'm beginning to work on this field, I actually started a few months ago.

I've read some papers but I quite enjoy that one

A Quick Guide for Developing Effective Bioinformatics Programming Skills

ADD COMMENTlink written 3.5 years ago by Pavid160
3
gravatar for Doo
3.2 years ago by
Doo190
Doo190 wrote:

The Clustal paper(s) - one of the most cited paper(s) in the world (all scientific areas)

.. Thompson, JD; Gibson, TJ; Plewniak, F; Jeanmougin, F; Higgins, DG The CLUSTAL_X windows interface: flexible strategies for multiple sequence alignment aided by quality analysis tools. NUCLEIC ACIDS RESEARCH, 25 (24): 4876-4882 DEC 15 1997

Chenna, R; Sugawara, H; Koike, T; Lopez, R; Gibson, TJ; Higgins, DG; Thompson, JD Multiple sequence alignment with the Clustal series of programs. NUCLEIC ACIDS RESEARCH, 31 (13): 3497-3500 JUL 1 200 ..

I think that is classic

ADD COMMENTlink written 3.2 years ago by Doo190
3
gravatar for Peter
3.2 years ago by
Peter90
Peter90 wrote:

Ruth Nussinov and George Pieczenik and Jerrold R. Griggs and Daniel J. Kleitman: Algorithms for Loop Matchings. In: SIAM Journal on Applied Mathematics. 35, Nr. 1, Juli 1978, S. 68-82.

30 years ago, she came up with a beautiful dynamic programming algorithm for secondary structure prediction.

ADD COMMENTlink written 3.2 years ago by Peter90
2
Please log in to add an answer.

Help
Access
  • RSS
  • Stats
  • API

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.0.0
Traffic: 575 users visited in the last hour