We all know that bioinformatics and computational biology are here to stay and that their impact on scientific thought and direction will be increasing in the future. Since there is a lot of interdisciplinary research being undertaken by folks who participate in this forum, I am curious as to what big and interesting biological problems folks think will be best solved either directly by computational approaches or in an integrated computational and bench science environment. Of course, I know that the answer to this is "everything", but I am really curious about specific questions in your field of interest.
I can't resist :-)
We'll get a answer for the "Ultimate Question of Bioinformatics, the Universe, and Everything.":
"Did you use 0 or 1 as the starting index of your annotation file ?"
edit: that was just for fun, please, don't upvote
I think one of the most under-served areas right now is complex visualizations of enormous and related data sets. It's something I have been thinking a lot about (but not solving, of course).
Directions I like:
*Tip of the Week: Caleydo for gene expression and pathway visualization I wish I had a tool with 5 planes where I could put all the resources I use and visualize them at once. And connect pieces somehow.
*Video Tip of the Week: MizBee Synteny Browser Miriah Meyer has some cool stuff going on.
But it struck me again most recently on the mitochondrial transcriptome. In mitochondria you need both nuclear and mitochondrial genes in the same space, but in no current browser can you really consider both genomes, you know?
I think that's going to be necessary to get more of the bench biologists with the domain knowledge to use the huge volumes of data to crack more problems in general.
More specifically I think a number of answers in cancer biology are going to come out of the big sequencing projects. But those may be mechanism solutions and not cures at this point, though. I would like to see that progress very much, I think that's what taxpayers really want from us.
Here are a few that I am betting on being growth areas....
- what is the recent evolutionary history of humans and other species? (population genomics)
- what are the role of unculturable microbes in human health, plant and animal pathology, ecological and environmental processes? (metagenomics)
- how to we integrate and aggeregate information distributed over the bioscience literature? (text mining)
- what is the mechanistic basis of chromatin remodelling on gene expression? (epigenomics)
Related to the forward genetics response.
As we identify sequence variation/mutation that is correlated with phenotype, one 'next question' that emerges is how these variations affect the molecular function of the gene at the protein or possibly RNA level. For example, if a gene is recurrently mutated in a disease by a non-conservative amino acid substitution. Does this result in gain-of-function or loss-of-function? How has this function been conferred or lost? What is it about the 3D structure, interaction partners, dimerization potential, etc. of the protein that is changed by mutation? Does this change alter the way small molecule inhibitors will interact with the protein? Does it suggest the possibility of a novel drug? How can we predict what that drug might look like without random compound library screening in the lab?
All of these questions relate to the fairly old structure-function relationship problem of molecular biology. Currently very expensive and lengthy wet lab work is required to address them. But we need to be addressing these problems computationally to be more systematic and speed up discovery and clinical translation...
As an evolutionary biologist working on large ecosystems, I see two very important questions that computational biology will allow to tackle:
What is the (most probable) "genealogy" for life on earth? Use massive amounts of molecular data to deduce scenarii for the evolution of life on earth.
Can we describe ecosystems as networks of biomolecular interactions and transformations? As the biosphere plays an important role in the processes shaping Earth chemistry and climate, building models describing the main ecosystems at a biomolecular level will be a major breakthrough.
One of the biggest questions of all is
forward genetics, the ability to identify the gene or set of genes that are responsible for a particular phenotype. There is a wide array of approaches in this field, but next generation sequencing has made some of them very affordable these days. Dan Koboldt (massgenomics) wrote a list of disease-causing mutations discovered by NGS+bioinformatics approaches in 2011:
One big question that we should be able to start to answer in the near future is what role regulatory variation and mutations play in disease (and other) phenotypes. This is currently receiving only a tiny fraction of people's attention as everyone rushes for the low-hanging fruit of coding mutations. However, soon that fruit will be picked. As declining costs allow us to sequence whole genomes in great numbers (instead of just exomes) we will (finally) be able to start turning more attention to the regulome.
Since nobody mentioned anything in the field of high performance computational biology, I think that the questions that this discipline will be able to answer in the next decade will be;
- can we predict protein structure from sequence?
- can we predict bio/macromolecular dynamics in a scale in which we can compare with experimental data?