How Far Does Bioinformatics Go?
Entering edit mode
12.3 years ago

Being a metabolomics, and drug discovery dude, I consider myself a bioinformatician (well, I also consider myself a chemist, cheminformatician, statistician, and chemometrician, but that's not relevant to my question).

However, some peers see bioinformatics restricted to stuff to do with DNA sequences, that is genomics. So, from a historical and literature perspective, what is bioinformatics? Please do back up your answer and argument with citations to primary literature.

meta subjective • 5.8k views
Entering edit mode

Since you asked for citations to primary literature, I recently came across this article: Earliest pages of bioinformatics. "This review is a brief outline of the chronology and essence of early events in bioinformatics, covering the period from 1869 (discovery of DNA by Miescher) to 1980-1981 (beginning of massive sequencing). For the purpose of this review, bioinformatics is understood as a chapter of molecular biology dealing with the amino acid and nucleotide sequences and with the information they carry."

Entering edit mode

Since you asked for citations to primary literature, I recently came across this article: Earliest pages of bioinformatics ( "This review is a brief outline of the chronology and essence of early events in bioinformatics, covering the period from 1869 (discovery of DNA by Miescher) to 1980-1981 (beginning of massive sequencing). For the purpose of this review, bioinformatics is understood as a chapter of molecular biology dealing with the amino acid and nucleotide sequences and with the information they carry."

Entering edit mode
12.3 years ago

Bioinformatics is the field of science in which biology and computer science/information technology merge into a single discipline. The ultimate goal of the field is to enable the discovery of new biological insights as well as to create a global perspective from which unifying principles in biology can be discerned.

Listed below are some of the major events in bioinformatics over the last several decades. Most of the events in the list occurred long before the term, "bioinformatics", was coined.

I've tagged each entry with either:

  • (BIO) if it was an event which was predominantly important in the field of biology.
  • (IT) if it was an event which was predominantly important in the field of computer science/information technology
  • (BIOINFO) if it was an event where biology and computer science/information technology truly merged and we can really speak from bioinformatics.

As you will notice, it becomes increasingly difficult/subjective to catalogue events with exclusively one tag, so the main point to take away is that in the course of bioinformatics history there has been a constant exchange of ideas between biology, computer science/information technology and bioinformatics.

  • 1665 (BIO) Robert Hooke published Micrographia, described the cellular structure of cork. He also described microscopic examinations of fossilized plants and animals, comparing their microscopic structure to that of the living organisms they resembled. He argued for an organic origin of fossils, and suggested a plausible mechanism for their formation.

  • 1683 (BIO) Antoni van Leeuwenhoek discovered bacteria.

  • 1686 (BIO) John Ray, John Ray's in his book "Historia Plantarum" catalogued and described 18,600 kinds of plants. His book gave the first definition of species based upon common descent.

  • 1843 (BIO) Richard Owen elaborated the distinction of homology and analogy.

  • 1864 (BIO) Ernst Haeckel (Häckel) outlined the essential elements of modern zoological classification.

  • 1865 (BIO) Gregory Mendel (1823-1884), Austria, established the theory of genetic inheritance.

  • 1902 (BIO) The chromosome theory of heredity is proposed by Sutton and Boveri, working independently.

  • 1905 (BIO) The word "genetics" is coined by William Bateson.

  • 1913 (BIO) First ever linkage map created by Columbia undergraduate Alfred Sturtevant (working with T.H. Morgan).

  • 1930 (BIO) Tiselius, Uppsala University, Sweden, A new technique, electrophoresis, is introduced by Tiselius for separating proteins in solution. "The moving-boundary method of studying the electrophoresis of proteins" (published in Nova Acta Regiae Societatis Scientiarum Upsaliensis, Ser. IV, Vol. 7, No. 4)

  • 1946 (BIO) Genetic material can be transferred laterally between bacterial cells, as shown by Lederberg and Tatum.

  • 1951 (BIO) Pauling and Corey propose the structure for the alpha-helix and beta-sheet (Proc. Natl. Acad. Sci. USA, 27: 205-211, 1951; Proc. Natl. Acad. Sci. USA, 37: 729-740, 1951).

  • 1952 (BIO) Alfred Day Hershey and Martha Chase proved that the DNA alone carries genetic information. This was proved on the basis of their bacteriophage research.

  • 1953 (BIO) Watson and Crick propose the double helix model for DNA based on x-ray data obtained by Franklin and Wilkins (Nature, 171: 737-738, 1953).

  • 1954 (BIO) Perutz's group develop heavy atom methods to solve the phase problem in protein crystallography.

  • 1955 (BIO) The sequence of the first protein to be analyzed, bovine insulin, is announced by F. Sanger.

  • 1958 (IT) The Advanced Research Projects Agency (ARPA) is formed in the US

  • 1958 (IT) The first integrated circuit is constructed by Jack Kilby at Texas Instruments.

  • 1961 (BIO) Sidney Brenner, François Jacob, Matthew Meselson, identify messenger RNA

  • 1962 (BIO) Pauling's theory of molecular evolution

  • 1965 (BIO) Margaret Dayhoff's Atlas of Protein Sequences

  • 1968 (IT) Packet-switching network protocols are presented to ARPA

  • 1969 (IT) The ARPANET is created by linking computers at Stanford, UCSB, The University of Utah and UCLA.

  • 1970 (BIOINFO) The details of the Needleman-Wunsch algorithm for sequence comparison are published.

  • 1971 (IT) Ray Tomlinson (BBN) invents the email program.

  • 1972 (BIO) The first recombinant DNA molecule is created by Paul Berg and his group.

  • 1973 (IT) Robert Metcalfe receives his Ph.D. from Harvard University. His thesis describes Ethernet.

  • 1973 (BIOINFO) The Brookhaven Protein Data Bank is announced (Acta. Cryst. B, 1973, 29: 1746).

  • 1974 (IT) Charles Goldfarb invents SGML (Standardized General Markup Language).

  • 1974 (IT) Vint Cerf and Robert Kahn develop the concept of connecting networks of computers into an "internet" and develop the Transmission Control Protocol (TCP).

  • 1975 (BIO) E. M. Southern published the experimental details for the Southern Blot technique of specific sequences of DNA (J. Mol. Biol., 98: 503-517, 1975).

  • 1975 (IT) Microsoft Corporation is founded by Bill Gates and Paul Allen.

  • 1975 (BIO) Two-dimensional electrophoresis, where separation of proteins on SDS polyacrylamide gel is combined with separation according to isoelectric points, is announced by P. H. O'Farrell (J. Biol. Chem., 250: 4007-4021, 1975).

  • 1976 (IT) The Unix-To-Unix Copy Protocol (UUCP) is developed at Bell Labs.

  • 1977 (BIOINFO) Allan Maxam and Walter Gilbert (Harvard) and Frederick Sanger (U.K. Medical Research Council), report methods for sequencing DNA.

  • 1977 (BIOINFO) DNA sequencing and software to analyze it (Staden)

  • 1977 (BIOINFO) The full description of the Brookhaven PDB ( is published (Bernstein, F.C.; Koetzle, T.F.; Williams, G.J.B.; Meyer, E.F.; Brice, M.D.; Rodgers, J.R.; Kennard, O.; Shimanouchi, T.; Tasumi, M.J.; J. Mol. Biol., 1977, 112:, 535).

  • 1978 (IT) The first Usenet connection is established between Duke and the University of North Carolina at Chapel Hill by Tom Truscott, Jim Ellis and Steve Bellovin.

  • 1980 (BIOINFO) IntelliGenetics, Inc. founded in California. Their primary product is the IntelliGenetics Suite of programs for DNA and protein sequence analysis.

  • 1980 (BIO) The first complete gene sequence for an organism (FX174) is published. The gene consists of 5,386 base pairs which code nine proteins.

  • 1980 (BIO) Wüthrich et. al. publish paper detailing the use of multi-dimensional NMR for protein structure determination (Kumar, A.; Ernst, R.R.; Wüthrich, K.; Biochem. Biophys. Res. Comm., 1980, 95:, 1).

  • 1981 (IT) IBM introduces its Personal Computer to the market.

  • 1981 (BIOINFO) The Smith-Waterman algorithm for sequence alignment is published.

  • 1981 (BIO) The concept of a sequence motif (Doolittle)

  • 1982 (BIOINFO) GenBank Release 3 made public

  • 1982 (BIO) Genetics Computer Group (GCG) created as a part of the University of Wisconsin of Wisconsin Biotechnology Center. The company's primary product is The Wisconsin Suite of molecular biology tools.

  • 1982 (BIO) Phage lambda genome sequenced

  • 1983 (IT) Name servers are developed at the University of Wisconsin.

  • 1983 (BIOINFO) Sequence database searching algorithm (Wilbur-Lipman)

  • 1983 (IT) The Compact Disk (CD) is launched.

  • 1984 (IT) Jon Postel's Domain Name System (DNS) is placed on-line.

  • 1984 (IT) The Macintosh is announced by Apple Computer.

  • 1985 (BIOINFO) FASTP/FASTN: fast sequence similarity searching algorithm is published.

  • 1985 (BIO) The PCR reaction is described by Kary Mullis and co-workers.

  • 1986 (IT) NSFnet debuts.

  • 1986 (BIOINFO) The SWISS-PROT database is created by the Department of Medical Biochemistry of the University of Geneva and the European Molecular Biology Laboratory (EMBL).

  • 1986 (BIO) The term "Genomics" appeared for the first time to describe the scientific discipline of mapping, sequencing, and analyzing genes. The term was coined by Thomas Roderick as a name for the new journal.

  • 1987 (BIO) The physical map of e. coli is published (Y. Kohara, et. al., Cell 51: 319-337).

  • 1987 (BIO) The use of yeast artifical chromosomes (YAC) is described (David T. Burke, et. al., Science, 236: 806-812).

  • 1988 (IT) A new program, an Internet computer virus designed by a student, infects 6,000 military computers in the US.

  • 1988 (BIOINFO) Des Higgins and Paul Sharpe announce the development of CLUSTAL (Higgins, D.G.; Sharp, P.M. Fast and sensitive multiple sequence alignments on a microcomputer. Comput. Appl. Biosci. 1989, 5, 151-153; Higgins, D.G.; Sharp, P.M. CLUSTAL: a package for performing multiple sequence alignment on a microcomputer. Gene 1988, 73, 237-244.)

  • 1988 (IT) EMBnet network for database distribution

  • 1988 (BIO) National Center for Biotechnology Information (NCBI) created at NIH/NLM

  • 1988 (IT) Perl (Practical Extraction Report Language) is released by Larry Wall.

  • 1988 (BIOINFO) The FASTA algorithm for sequence comparison is published by Pearson and Lupman.

  • 1988 (BIO) The Human Genome Initiative is started (Commission on Life Sciences, National Research Council. Mapping and Sequencing the Human Genome, National Academy Press: Washington, D.C.), 1988.

  • 1988 (BIO) The National Center for Biotechnology Information (NCBI) is established at the National Cancer Institute.

  • 1990 (BIOINFO) BLAST: fast sequence similarity searching (Altschul, et. al.) is implemented.

  • 1990 (IT) The HTTP 1.0 specification is published. Tim Berners-Lee publishes the first HTML document.

  • 1991 (BIO) EST: expressed sequence tag sequencing

  • 1991 (IT) Linus Torvalds announces a Unix-Like operating system which later becomes Linux.

  • 1991 (BIO) Myriad Genetics, Inc. is founded in Utah. The company's goal is to lead in the discovery of major common human disease genes and their related pathways. The Company has discovered and sequenced, with its academic collaborators, the following major genes: BRCA1, BRCA2, CHD1, MMAC1, MMSC1, MMSC2, CtIP, p16, p19, and MTS2.

  • 1991 (BIO) The creation and use of expressed sequence tags (ESTs) is described (J. Craig Venter, et. al., Science, 252: 1651-1656).

  • 1991 (IT) The research institute in Geneva (CERN) announces the creation of the protocols which make-up the World Wide Web.

  • 1992 (BIO) Mel Simon and coworkers announce the use of BACs for cloning.

  • 1992 (BIO) The Institute for Genomic Research (TIGR) is established by Craig Venter.

  • 1993 (BIO) Affymetrix begins independent operations in Santa Clara, California

  • 1993 (BIO) Sanger Centre, Hinxton, UK

  • 1994 (BIOINFO) EMBL European Bioinformatics Institute, Hinxton, UK

  • 1994 (IT) Netscape Comminications Corporation founded and releases Navigator, the commercial version of NCSA's Mozilla.

  • 1994 (BIOINFO) The PRINTS database of protein motifs is published by Attwood and Beck.

  • 1995 (BIO) First bacterial genomes completely sequenced

  • 1995 (IT) Microsoft releases version 1.0 of Internet Explorer.

  • 1995 (IT) Sun releases version 1.0 of Java. Sun and Netscape release version 1.0 of JavaScript

  • 1995 (BIO) The Haemophilus influenzea genome (1.8 Mb) is sequenced.

  • 1995 (BIO) The Mycoplasma genitalium genome is sequenced.

  • 1995 (IT) Version 1.0 of Apache is released.

  • 1996 (BIO) Affymetrix produces the first commercial DNA chips.

  • 1996 (BIO) Oxford Molecular Group acquires the MacVector product from Eastman Kodak.

  • 1996 (BIOINFO) Structural Bioinformatics, Inc. founded in San Diego, CA.

  • 1996 (BIO) The Prosite database is reported by Bairoch,

  • 1996 (BIO) The genome for Saccharomyces cerevisiae (baker's yeast, 12.1 Mb) is sequenced.

  • 1996 (IT) The working draft for XML is released by W3C.

  • 1996 (BIO) Yeast genome completely sequenced

  • 1997 (BIOINFO) LION bioscience AG founded as an integrated genomics company with strong focus on bioinformatics. The company is built from IP out of the European Molecular Biology Laboratory (EMBL), the European Bioinformatics Institute (EBI), the German Cancer Research Center (DKFZ), and the University of Heidelberg.


  • 1997 (BIO) The genome for E. coli (4.7 Mbp) is published.

  • 1998 (BIO) Craig Venter forms Celera in Rockville, Maryland.

  • 1998 (BIOINFO) Inpharmatica, a new Genomics and Bioinformatics company, is established by University College London, the Wolfson Institute for Biomedical Research, five leading scientists from major British academic centers and Unibio Limited.

  • 1998 (BIOINFO) The Swiss Institute of Bioinformatics is established as a non-profit foundation.

  • 1998 (BIO) The genomes for Caenorhabditis elegans and baker's yeast are published.

  • 1998 (BIO) Worm (multicellular) genome completely sequenced

  • 1998 (BIO) deCode genetics publishes a paper that described the location of the FET1 gene, which is responsible for familial essential tremor, on chromosome 13 (Nature Genetics).

  • 1999 (BIO) Fly genome completely sequenced

  • 1999 (BIO) deCode genetics maps the gene linked to pre-eclampsia as a locus on chromosome 2p13.

  • 2000 (BIO) Jeong H, Tombor B, Albert R, Oltvai ZN, Barabasi AL. The large-scale organization of metabolic networks. Nature 2000 Oct 5;407(6804):651-4, PubMed

  • 2000 (BIO) The A. thaliana genome (100 Mb) is secquenced.

  • 2000 (BIO) The D. melanogaster genome (180Mb) is secquenced.

  • 2000 (BIO) The genome for Pseudomonas aeruginosa (6.3 Mbp) is published.

  • 2001 (BIO) The human genome (3,000 Mbp) is published.

  • 2002 (BIO) An international sequencing consortium published the full genome sequence of the common house mouse (2.5 Gb). Whitehead Institute researcher Kerstin Lindblad-Toh is the lead author on the paper; her institution lead the project and contributed about half of the sequence. Washington University School of Medicine delivered about 30 percent of the sequence, and created the mouse BAC-based physical map. The Wellcome Trust Sanger Institute in the UK was the third major partner. Other institutes in the International Mouse Genome Sequencing Consortium included the University of California at Santa Cruz, the Institute for Systems Biology, and the University of Geneva.

  • 2004 (BIO) The draft genome sequence of the brown Norway laboratory rat, Rattus norvegicus, was completed by the Rat Genome Sequencing project Consortium. The paper appears in the April 1 edition of Nature.

Compiled from different sources, including:

Entering edit mode

This is awesome. now where is the 2004 to present... :-)

Entering edit mode

Wow. That's a pretty good history lesson! Thanks!

Entering edit mode

So, according to this list Bioinfo is genome only, right?

Entering edit mode
12.3 years ago
Nicojo ★ 1.1k

As much as I agree with the impressive list from BioGeek, I have to say that it is a non-exhaustive, genomics centric list.

If we look at the first statement he mentions: "Bioinformatics is the field of science in which biology and computer science/information technology merge into a single discipline."

From that statement I understand that anything "biological" studied using a "computer" should have its place in what we call "Bioinformatics".

One perfect example would be all those people working on proteins and not DNA/RNA. I wouldn't say that those are the same field in Bioinformatics, however you may argue that they fit together... As an example of a "Bioinformatics Center" that focuses on proteins I would give: Stockholm Bioinformatics Center

Another example would be those people trying to understand how molecules diffuse within the cell cytoplasm. That is a lot of computer work that's directly looking at understanding a phenomenon in a biological context. This type of project is bordering on many disciplines and not just biology and informatics, but also physics. Nevertheless, shouldn't that be a "bioinformatics" discipline too? In this category I would give the Biomatter @ MOSAIC ETH Zurich

What about Systems Biology (even if they don't want to be called bioinformaticians), shouldn't they be called bioinformaticians too?

And finally (although far from exhaustive) I'll give one last example of something I consider bioinformatics: the E-Cell Project

I hope this answers your question! In my opinion, bioinformatics is NOT only "genome stuff" and I would extend it to yourself too ;)

Entering edit mode
12.3 years ago

Many people consider that bioinformatics began with the work of Margaret Dayhoff and the Pam matrixes. She compiled the first collection of protein sequences available at the time, publishing the Atlas of Protein Sequences and Structure, and she developed the first method to give a score to the similarity of two proteins, the PAM matrix.

For me, bioinformatics is everything that derived from Margaret Dayhoff's work. Compiling data and organizing it, developing tools to compare and handle informations, share the data with other people: if you read her biography you will find everything already there.

About the modern bioinformatics, I like to think of it as the science of doing experiments or part of them using computers at least for some steps. I like to think that there is no difference between the work in a wet lab and that in front of computer: when you are planning a bioinformatics project, you also have to think of an hypothesis, on how to verify it and on which tests and controls you will use. This is probably something that many people didn't understand yet, as they think that bioinformatics is just 'writing programs' and the don't even know what a test is and how much time it takes to write a program.

Entering edit mode
10.7 years ago
Patrick Koks ▴ 10

Of course, it's not only about definitions and terminology, but in this essay Pauline Hogeweg, who coined the term 'bioinformatics' in 1978 or 1980 or as early even as 1970, describes what her group initially used the term for: "study of informatic processes in biotic systems". Informatics or informatic processes are actually pointing at the 'flow of information through a living cell, individual, species or evolution'. Hogewegs' work is focused on pattern recognition, morphogenesis and evolution.

In other words, bioinformatics covers a very broad range of topics and originally wasn't even meant for sequence based , or even hard data-driven research.

So metabolomics easily fits in this definition, as long as you study samples from a 'biotic' origin. Reasoning that (small) metabolites have an important role in information transfer is obvious, certainly in drugs discovery.

Does this make you a happier bioinformatcian? I hope not. (as your own opinion and feeling is what matters in the first place)

Is this answer placing metabolomics right at the heart of the bioinformatics community? I think not, but the good news is that after having sequenced 'everything' in 2012 or 2013, and some chewing on this in the years after that, bioinformaticians will have to follow 'experimentalists' to the core discipline of biology metabolomics is to become sooner or later.


Login before adding your answer.

Traffic: 1181 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6