Which Application Is Truly Missing In Bioinformatics?
17
9
Entering edit mode
12.7 years ago

It's a simple & straight questions. Just think about an app that when you found it, you first thought would be - "OMG!!! That's it" - or smth like - "I wish I could have found/written/idealized it before". Don't need to be a bioinformatical swiss knife or a McGuyver paper clip. Just smth that would make your life much happier/easier.

My example is quite simple. I really wish that some sort of Monte Carlo Simulator of Generic Urn Models (population genetics rlz!) just appear in the net, with a nice, clean and well documented API (written in C) and bindings for my favorite scripting languages. That's what I really miss, right now. What's your story?

-- Edit --

I'm considering to start a bounty for this question. Maybe the first answer to get 10 up votes. Where are people's whishes? Don't be afraid, bionformaticians like me are lousy critics.

subjective general • 8.0k views
0
Entering edit mode

Yeah ! I'm a math nerd . . .

12
Entering edit mode
12.7 years ago
Neilfws 49k

I think it would be incorrect to imagine that there is a single, "killer app" for bioinformatics. I don't see bioinformatics as a field, discipline or topic. For me it is about: (1) inputs - many, diverse types of biological data, (2) processes - the code that we write to handle the data and (3) outputs - what the code produces and the subsequent biological interpretation.

That said, I'm sure there are tools we would all like to see when we handle whatever data type comes our way. I'd like to see:

1. A RESTful API for every public, online database
2. Better web applications for data integration - so that when I search for, e.g. a gene, everything that's known about that gene and its products is presented to me in a way that enables effective data exploration

There are those who say that the "linked data web" is the answer to (2), which remains to be seen...

1
Entering edit mode

The Linked Data web is showing really powerful, just by using simple standards... there is the obvious learning curve, and many people don't always use the right implementations for the right problem, but the standards are starting to rule...

0
Entering edit mode

Don't need to be a generic killer app. Just the one which would make your life more colorful !!! Maybe we are all dreaming about the same thing . . .

8
Entering edit mode
12.7 years ago

I wish R had an entire web application framework like Rails and (Shiny covers this) that there was an easier way to go between genome browsers and analysis and back.

There are also not enough tools to properly organize individual genetic variation in humans much less poorly characterized species. I guess we'll have to see what 1000 genomes comes up with.

I would also like a proper "finishing tool" to visualize and reconcile Velvet assemblies.

7
Entering edit mode
12.7 years ago
Phis ★ 1.1k

/irony on

Something like this is missing for bioinformatics.

/irony off

Seriously, I think whatever it is, a clean and well documented API - as you say - is something it should provide.

1
Entering edit mode

rotfl.... +100!

7
Entering edit mode
11.9 years ago

Well, I'm not looking for a killer app but for the killer file format: I would like to see an obligatory use of xml (or any other standard) for sequences, phylogenie, etc.

Isn't it true that most of our work is related to formatting and struggling with strange output files? For instance, who can tell whether a Newick string contains information on probabilities, distances or time measurement? This heavily depends on the program which generated the output. Therefore, I demand a common file structure with clearly defined fields for any relevant information.

4
Entering edit mode

XML is probably the best format for phylogenetic trees. Nonetheless, XML is notoriously bad for huge data sets, hard to be indexed, and against nearly all useful Unix utilities.

1
Entering edit mode

Well, I do not think one can deny XML is Unix unfriendly. As for huge data sets, look at tracedb, NGS reads and alignments. No XML at all. Even for relatively smaller data sets such as variants and genotyping results, almost no XML. XML is good for many things, but do not abuse it.

1
Entering edit mode

That's the truth right in the face. We do spend a lot of time, curating/converting/trimming data from various fonts. Some sort of standard/spec ou guidelines is much needed. I even don't know why databases like NCBI still offer such a variety of different file formats. I think that this idea deserves a topic on its own.

1
Entering edit mode

Readability, complexity and efficiency are conflict with each other. Different people have different preferences, too. I do not see a single format can be suitable for all purposes, just as no single programming language can rule the world. All we should do is to choose the right format for the right thing.

1
Entering edit mode

XML has lots of problems for bioinformatics. It adds huge bloat (in a field already drowning in data) and it's awkward to represent bio symbols (eg "<" ">" for secondary structure). JSON would be better, but really, IMNSHO, this kind of complaint is as old as the hills, and tends to reflect a lack of understanding of exactly how and why formats in bioinformatics are so disparate, and the difficulties of unifying them (almost insurmountable except in the context of some massive new project which could drive a whole lot of other little changes, such as file format migrations, in its wake)

0
Entering edit mode

lh3 that's just your humble opinion isn't it ;) what you are saying is debatable at best.

0
Entering edit mode

hope, I agree with you! One such approach to reduce the interoperability nightmare is BioXSD: http://www.bioxsd.org Further, xml is compressible, supports structured queries e.g. XPATH, and there exist XML-databases to support indexing and fast queries (just my humble opinion). Actually XML is a big advantage because it is actually a format (something with a defined formal syntax for which a parser exists) while other files are not formats (loosely defined, parsing nightmares, etc.)

0
Entering edit mode

Even Word and Excel files are stored in as XML-like way (docx). For me, this format is just a methode to provide meaningful additional data to the original dataset. Something like exif for images.

0
Entering edit mode

BioXSD looks great! Hopefully it will make its way! My support is for sure.

0
Entering edit mode

lhl3: "Well, I do not think one can deny XML is Unix unfriendly", well I can, because that term Unix unfriendly is not defined and whether or not something is 'friendly' towards an OS is totally irrelevant in many cases, learn to make an informed decision of tools without advocating UNIX, XML or anything as an ideology, everything has it's pros and cons. UNIX was there long before you and me and XML were there. There are a lot of useful tools that run under various OS including UNIX, linux, BSD, search your LINUX package manager for XML.

0
Entering edit mode

in mass spec proteomics xml seems to be prevalent not just for metadata but for core data too (spectra ~ peak lists, peptide/protein identifications), PRIDE xml, mzML, mzIdentML...and I am saying this neutrally

0
Entering edit mode

In mass spec, there is not that much data (compared to NGS)

5
Entering edit mode
12.7 years ago
Darked89 4.2k

Web based genome browser (WGB) which does not require 110 Perl modules (one of which simply will not install). WGB with can handle multiple eukariotic genomes, next gen sequence data and can render things more pleasant for the eye than >>>>>---- or sharp square boxes.

Maybe with this: http://genometools.org/annotationsketch.html or Google Maps.

1
Entering edit mode

fyi, you can (and i have) linked up genome tools annotation sketch to this: http://code.google.com/p/genome-browser/ to make a slippy-map genome-browser.

1
Entering edit mode

X:map, at http://xmap.picr.man.ac.uk/, might be something you'd like to take a look at. It's very cool.

0
Entering edit mode

[?]X:map[?] might be something you'd like to take a look at.

0
Entering edit mode

JBrowse relies only on BioPerl (and that requirement will probably soon be eliminated)

4
Entering edit mode
12.7 years ago

I want a bio2rdf for the whole scientific corpus.

0
Entering edit mode

Now that's a miracle !!! :)

3
Entering edit mode
11.9 years ago
Tg ▴ 310

Something like CPAN, but for bioinformatics program instead of perl module.

I'm just to lazy to looks for a link.

2
Entering edit mode
12.6 years ago

I think there's room for a killer app in genome browsers, definitely. Each browser has strengths and weaknesses, and if someone could just wrap up all the strengths with none of the weaknesses, bam.

I think more development in the sharing/public data realm would be great for bioinformatics, though I'm not sure if it necessarily has to take the form of an application. Galaxy might be the answer there.

R is good, but there's a lot of room for improvement. Getting some nice streamlined robust documented code for short read analysis would be nice, especially if they could do away with one or two of their complicated objects. (If you think I'm wrong, please point me to your code).

bedtools kind of felt like a killer app to me, as did galaxy.

2
Entering edit mode

If it's data originating from a biological experiment and you're using a computer to do something to it, why can't you call it bioinformatics?

2
Entering edit mode

@Jarretinha : I can't agree with your statement "bioinformatics is not about dealing with sequence data". Sequence analysis(small, large or bigdata in scale) is an integral part of bioinformatics, not just the tip. Basically any experimental approach that generate data related "biology" matters to bioinformatics. In that sense, sequence data is a key player. Can you tell me any other data category that contributed a significant amount of methods or approaches than sequence data ?

0
Entering edit mode

As noted before, bioinformatics is not about dealing with sequence data. Just to remember, bioinformatics was born inside crystallography community and borrowed a lot from phylogenetics and molecular evolution. Despite the popularity sequence data, they're just the tiny big tip of a huge databerg.

0
Entering edit mode

0
Entering edit mode

Bioinformatics started inside the crystallography community. NCBI was founded in 1988 and PDB in 1971. Bioinformatics started with x-ray data and 3D structural data. These aren't, definitively, sequence data. Sequences are just popular by now. When I think big about bioinformatics, the first thing that comes out of my head is CASP. There is nothing similar to it for sequence data. They're not just contributing new methods, they're rigorously benchmarking them. There's nothing similar to it for sequence data, not even in phylogenetics. I'm a sequence guy now, and I can say: we are far behind!

0
Entering edit mode

@Jarretinha: Agree with your points on NCBI and PDB, but http://www.dayhoff.cc/MODAtlas.html Atlas of Protein Sequence and Structure (started in 1961) is considered to be a seminal work in cataloging and archiving molecular biology data which lead to the data revolution in bioinformatics. I appreciate your opinion and views on CASP, but my point is still valid - even CASP is about fold recognition assigning a query 'sequence' with a 'fold' in a database of folds. There also the entity that we use to search is 'sequence'.

2
Entering edit mode
11.9 years ago
lh3 33k

I wish we had an integrated pipeline to call SNPs, INDELs and CNV/SVs jointly in one go, and before that I would like to see how far we can go about INDELs and SVs. These are far from solved problems.

I wish we had a de novo assembler that is able to assemble a mammalian genome with reasonable compute resources and achieve a quality close to the human genome. Most of alternative human assemblies are much worse.

I wish we had an aligner that maps sequence not only to the reference genome, but to a collection of genomes. We have made progress, but there is still a long way to go.

Like Jarretinha, I wish we had a flexible software suite that estimates population parameters. I am thinking a sort of combination of MrBayes and ms instead of a library.

As to web development, my impression is that there are too many alternative ways which are all good but all make tradeoff. Different people have different preferences. Just like the versatility of programming languages, the versatility of web development will long live.

Ultimately I wish we could standardize each subfield to avoid countless choices (one or two programming languages, genome browsers, ontologies, mappers, assemblers, ...). This may happen and have already happened in a few subfields, but is never going to happen in all.

2
Entering edit mode
11.7 years ago
Mary 11k

Here's one I loved--not as it was, I'd want to modify it a bit. But this is the one I would love to be using: Caleydo. http://www.icg.tugraz.at/project/caleydo/
I talked about it here in a bit more detail: http://blog.openhelix.eu/?p=3578.

But I like it because it could conceivably put the 5 tools I need to visualize a lot of what I'm doing up in the 3D box. But I want to be able to pick the 5 tools I'm looking at: say UCSC on the box floor, dbSNP on one side, Reactome on another, expression data in another...etc. And if I could link them together with lines for my train of thought...that would solve the integration that I need for my level of exploration in many cases.

Want.

2
Entering edit mode
11.7 years ago
Andrewjgrimm ▴ 460

Automating grant application writing.

What's the difference between a grant application and a spam email anyway? Both are asking for money, and they both have rejection rates.

1
Entering edit mode
12.7 years ago
Yannick Wurm ★ 2.4k

I want to be able to orally describe what I want to visualize and it to just happen without me having to think about how to do it.

1
Entering edit mode

Okay, this one is just kind of hilarious.

0
Entering edit mode

Why not a slot/controller in your brain with a direct connection with the machine? No need to talk !!! But, how this could be useful for bionformatics?

0
Entering edit mode

because a lot of time in bioinformatics is spent massaging data to get results and visualization...

0
Entering edit mode

If only someone would make this cartoon just like that conversation with a biostatistician... http://blog.openhelix.eu/?p=5355

0
Entering edit mode

@Mary: you can by typing the text at http://www.xtranormal.com/makemovies/

1
Entering edit mode
11.9 years ago

The killer bioinformatics application I find missing is a sequence aligner that operates in O(1/N).

0
Entering edit mode

Not even a quantum computer could accomplish that. Maybe a non-deterministic Turing machine or an oracle ...

0
Entering edit mode
12.7 years ago
Yuri ★ 1.6k

I'd like Bioconductor-like repository would exist for MATLAB. Probably not gonna happen. :(

0
Entering edit mode
6.0 years ago

I am looking for one that gives me the answer to life, the universe and everything. I guess something like echo "42". Killer apps need killer problems to solve.

0
Entering edit mode
23 months ago
jerry ▴ 110

This is an old but interesting thread. Lots of good ideas (and some funny ones).

As you can see, it's been 10 years since this question was asked, but we are still using many of the same tools (samtools, bedtools, bowtie, UCSC genome browser, dbSNP, etc.). The problem is that there just isn't a market for innovation in Bioinformatics - the community and its user base is small. Given how difficult it is to create great Bioinformatics tools, it's not worth the effort in most cases.

1
Entering edit mode

I think the field has matured to a point where we are able to benefit off tools not specifically built for bioinformatics. conda, snakemake, nextflow etc are examples of general purpose tools that we have adapted for our purposes.

The tools you describe are best at dealing with specific file formats or in cases you have not mentioned (such as plink), specific tasks or families of tasks. We have the basic pieces and putting them together doesn't need as much ab initio work.