Question: Which Application Is Truly Missing In Bioinformatics?
9
gravatar for Jarretinha
8.7 years ago by
Jarretinha3.3k
São Paulo, Brazil
Jarretinha3.3k wrote:

It's a simple & straight questions. Just think about an app that when you found it, you first thought would be - "OMG!!! That's it" - or smth like - "I wish I could have found/written/idealized it before". Don't need to be a bioinformatical swiss knife or a McGuyver paper clip. Just smth that would make your life much happier/easier.

My example is quite simple. I really wish that some sort of Monte Carlo Simulator of Generic Urn Models (population genetics rlz!) just appear in the net, with a nice, clean and well documented API (written in C) and bindings for my favorite scripting languages. That's what I really miss, right now. What's your story?

-- Edit --

I'm considering to start a bounty for this question. Maybe the first answer to get 10 up votes. Where are people's whishes? Don't be afraid, bionformaticians like me are lousy critics.

general subjective • 5.0k views
ADD COMMENTlink modified 2.0 years ago by rkostadi60 • written 8.7 years ago by Jarretinha3.3k

Yeah ! I'm a math nerd . . .

ADD REPLYlink written 8.7 years ago by Jarretinha3.3k
11
gravatar for Neilfws
8.7 years ago by
Neilfws48k
Sydney, Australia
Neilfws48k wrote:

I think it would be incorrect to imagine that there is a single, "killer app" for bioinformatics. I don't see bioinformatics as a field, discipline or topic. For me it is about: (1) inputs - many, diverse types of biological data, (2) processes - the code that we write to handle the data and (3) outputs - what the code produces and the subsequent biological interpretation.

That said, I'm sure there are tools we would all like to see when we handle whatever data type comes our way. I'd like to see:

  1. A RESTful API for every public, online database
  2. Better web applications for data integration - so that when I search for, e.g. a gene, everything that's known about that gene and its products is presented to me in a way that enables effective data exploration

There are those who say that the "linked data web" is the answer to (2), which remains to be seen...

ADD COMMENTlink written 8.7 years ago by Neilfws48k
1

The Linked Data web is showing really powerful, just by using simple standards... there is the obvious learning curve, and many people don't always use the right implementations for the right problem, but the standards are starting to rule...

ADD REPLYlink written 8.2 years ago by Egon Willighagen5.2k

Don't need to be a generic killer app. Just the one which would make your life more colorful !!! Maybe we are all dreaming about the same thing . . .

ADD REPLYlink written 8.7 years ago by Jarretinha3.3k
8
gravatar for Jeremy Leipzig
8.7 years ago by
Philadelphia, PA
Jeremy Leipzig18k wrote:

I wish R had an entire web application framework like Rails and  (Shiny covers this) that there was an easier way to go between genome browsers and analysis and back.

There are also not enough tools to properly organize individual genetic variation in humans much less poorly characterized species. I guess we'll have to see what 1000 genomes comes up with.

I would also like a proper "finishing tool" to visualize and reconcile Velvet assemblies.

ADD COMMENTlink modified 3.5 years ago • written 8.7 years ago by Jeremy Leipzig18k
7
gravatar for Phis
8.7 years ago by
Phis1.0k
CH
Phis1.0k wrote:

/irony on Something like the following is missing for bioinformatics: http://pdos.csail.mit.edu/scigen/

/irony off

Seriously, I think whatever it is, a clean and well documented API - as you say - is something it should provide.

ADD COMMENTlink written 8.7 years ago by Phis1.0k
1

rotfl.... +100!

ADD REPLYlink written 8.7 years ago by Giovanni M Dall'Olio26k
7
gravatar for In The Hope Of A Better World
7.9 years ago by

Well, I'm not looking for a killer app but for the killer file format: I would like to see an obligatory use of xml (or any other standard) for sequences, phylogenie, etc.

Isn't it true that most of our work is related to formatting and struggling with strange output files? For instance, who can tell whether a Newick string contains information on probabilities, distances or time measurement? This heavily depends on the program which generated the output. Therefore, I demand a common file structure with clearly defined fields for any relevant information.

ADD COMMENTlink written 7.9 years ago by In The Hope Of A Better World70
4

XML is probably the best format for phylogenetic trees. Nonetheless, XML is notoriously bad for huge data sets, hard to be indexed, and against nearly all useful Unix utilities.

ADD REPLYlink written 7.9 years ago by lh331k
1

Well, I do not think one can deny XML is Unix unfriendly. As for huge data sets, look at tracedb, NGS reads and alignments. No XML at all. Even for relatively smaller data sets such as variants and genotyping results, almost no XML. XML is good for many things, but do not abuse it.

ADD REPLYlink written 7.9 years ago by lh331k
1

That's the truth right in the face. We do spend a lot of time, curating/converting/trimming data from various fonts. Some sort of standard/spec ou guidelines is much needed. I even don't know why databases like NCBI still offer such a variety of different file formats. I think that this idea deserves a topic on its own.

ADD REPLYlink written 7.9 years ago by Jarretinha3.3k
1

Readability, complexity and efficiency are conflict with each other. Different people have different preferences, too. I do not see a single format can be suitable for all purposes, just as no single programming language can rule the world. All we should do is to choose the right format for the right thing.

ADD REPLYlink written 7.9 years ago by lh331k
1

XML has lots of problems for bioinformatics. It adds huge bloat (in a field already drowning in data) and it's awkward to represent bio symbols (eg "<" ">" for secondary structure). JSON would be better, but really, IMNSHO, this kind of complaint is as old as the hills, and tends to reflect a lack of understanding of exactly how and why formats in bioinformatics are so disparate, and the difficulties of unifying them (almost insurmountable except in the context of some massive new project which could drive a whole lot of other little changes, such as file format migrations, in its wake)

ADD REPLYlink written 7.9 years ago by Ihh20

lh3 that's just your humble opinion isn't it ;) what you are saying is debatable at best.

ADD REPLYlink written 7.9 years ago by Michael Dondrup45k

hope, I agree with you! One such approach to reduce the interoperability nightmare is BioXSD: http://bioxsd.org/ Further, xml is compressible supports structured queries e.g. XPATH and there exist XML-databases to support indexing and fast queries (just my humble oppinion). Actually XML is a big advantage because it is acutally a format (something with a defined formal syntax for which a parser exists) while other files are not formats (loosely defined, parsing nighmares, etc.)

ADD REPLYlink written 7.9 years ago by Michael Dondrup45k

hope, I agree with you! One such approach to reduce the interoperability nightmare is BioXSD: http://www.bioxsd.org Further, xml is compressible, supports structured queries e.g. XPATH, and there exist XML-databases to support indexing and fast queries (just my humble opinion). Actually XML is a big advantage because it is actually a format (something with a defined formal syntax for which a parser exists) while other files are not formats (loosely defined, parsing nightmares, etc.)

ADD REPLYlink written 7.9 years ago by Michael Dondrup45k

Even Word and Excel files are stored in as XML-like way (docx). For me, this format is just a methode to provide meaningful additional data to the original dataset. Something like exif for images.

ADD REPLYlink written 7.9 years ago by In The Hope Of A Better World70

BioXSD looks great! Hopefully it will make its way! My support is for sure.

ADD REPLYlink written 7.9 years ago by In The Hope Of A Better World70

lhl3: "Well, I do not think one can deny XML is Unix unfriendly", well I can, because that term Unix unfriendly is not defined and whether or not something is 'friendly' towards an OS is totally irrelevant in many cases, learn to make an informed decision of tools without advocating UNIX, XML or anything as an ideology, everything has it's pros and cons. UNIX was there long before you and me and XML were there. There are a lot of useful tools that run under various OS including UNIX, linux, BSD, search your LINUX package manager for XML.

ADD REPLYlink written 7.9 years ago by Michael Dondrup45k

in mass spec proteomics xml seems to be prevalent not just for metadata but for core data too (spectra ~ peak lists, peptide/protein identifications), PRIDE xml, mzML, mzIdentML...and I am saying this neutrally

ADD REPLYlink written 7.7 years ago by Attila Csordas520

In mass spec, there is not that much data (compared to NGS)

ADD REPLYlink written 7.7 years ago by Aaron Statham1.1k
5
gravatar for Darked89
8.7 years ago by
Darked894.2k
Barcelona, Spain
Darked894.2k wrote:

Web based genome browser (WGB) which does not require 110 Perl modules (one of which simply will not install). WGB with can handle multiple eukariotic genomes, next gen sequence data and can render things more pleasant for the eye than >>>>>---- or sharp square boxes.

Maybe with this: http://genometools.org/annotationsketch.html or Google Maps.

ADD COMMENTlink written 8.7 years ago by Darked894.2k
1

fyi, you can (and i have) linked up genome tools annotation sketch to this: http://code.google.com/p/genome-browser/ to make a slippy-map genome-browser.

ADD REPLYlink written 8.7 years ago by brentp22k
1

X:map, at http://xmap.picr.man.ac.uk/, might be something you'd like to take a look at. It's very cool.

ADD REPLYlink written 8.4 years ago by Cbare60

[?]X:map[?] might be something you'd like to take a look at.

ADD REPLYlink written 8.4 years ago by Cbare60

JBrowse relies only on BioPerl (and that requirement will probably soon be eliminated)

ADD REPLYlink written 7.9 years ago by Ihh20
4
gravatar for Pierre Lindenbaum
8.7 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum115k wrote:

I want a bio2rdf for the whole scientific corpus.

ADD COMMENTlink written 8.7 years ago by Pierre Lindenbaum115k

Now that's a miracle !!! :)

ADD REPLYlink written 8.7 years ago by Jarretinha3.3k
3
gravatar for Tg
7.9 years ago by
Tg310
Thailand
Tg310 wrote:

Something like CPAN, but for bioinformatics program instead of perl module.

I'm just to lazy to looks for a link.

ADD COMMENTlink written 7.9 years ago by Tg310
2
gravatar for Madelaine Gogol
8.7 years ago by
Madelaine Gogol5.0k
Kansas City
Madelaine Gogol5.0k wrote:

I think there's room for a killer app in genome browsers, definitely. Each browser has strengths and weaknesses, and if someone could just wrap up all the strengths with none of the weaknesses, bam.

I think more development in the sharing/public data realm would be great for bioinformatics, though I'm not sure if it necessarily has to take the form of an application. Galaxy might be the answer there.

R is good, but there's a lot of room for improvement. Getting some nice streamlined robust documented code for short read analysis would be nice, especially if they could do away with one or two of their complicated objects. (If you think I'm wrong, please point me to your code).

bedtools kind of felt like a killer app to me, as did galaxy.

ADD COMMENTlink written 8.7 years ago by Madelaine Gogol5.0k
2

If it's data originating from a biological experiment and you're using a computer to do something to it, why can't you call it bioinformatics?

ADD REPLYlink written 8.7 years ago by Madelaine Gogol5.0k
2

@Jarretinha : I can't agree with your statement "bioinformatics is not about dealing with sequence data". Sequence analysis(small, large or bigdata in scale) is an integral part of bioinformatics, not just the tip. Basically any experimental approach that generate data related "biology" matters to bioinformatics. In that sense, sequence data is a key player. Can you tell me any other data category that contributed a significant amount of methods or approaches than sequence data ?

ADD REPLYlink written 8.7 years ago by Khader Shameer17k

As noted before, bioinformatics is not about dealing with sequence data. Just to remember, bioinformatics was born inside crystallography community and borrowed a lot from phylogenetics and molecular evolution. Despite the popularity sequence data, they're just the tiny big tip of a huge databerg.

ADD REPLYlink written 8.7 years ago by Jarretinha3.3k

I agree with Khader

ADD REPLYlink written 7.9 years ago by Thaman3.2k

Bioinformatics started inside the crystallography community. NCBI was founded in 1988 and PDB in 1971. Bioinformatics started with x-ray data and 3D structural data. These aren't, definitively, sequence data. Sequences are just popular by now. When I think big about bioinformatics, the first thing that comes out of my head is CASP. There is nothing similar to it for sequence data. They're not just contributing new methods, they're rigorously benchmarking them. There's nothing similar to it for sequence data, not even in phylogenetics. I'm a sequence guy now, and I can say: we are far behind!

ADD REPLYlink written 7.9 years ago by Jarretinha3.3k

@Jarretinha: Agree with your points on NCBI and PDB, but http://www.dayhoff.cc/MODAtlas.html Atlas of Protein Sequence and Structure (started in 1961) is considered to be a seminal work in cataloging and archiving molecular biology data which lead to the data revolution in bioinformatics. I appreciate your opinion and views on CASP, but my point is still valid - even CASP is about fold recognition assigning a query 'sequence' with a 'fold' in a database of folds. There also the entity that we use to search is 'sequence'.

ADD REPLYlink written 7.9 years ago by Khader Shameer17k
2
gravatar for lh3
7.9 years ago by
lh331k
United States
lh331k wrote:

I wish we had an integrated pipeline to call SNPs, INDELs and CNV/SVs jointly in one go, and before that I would like to see how far we can go about INDELs and SVs. These are far from solved problems.

I wish we had a de novo assembler that is able to assemble a mammalian genome with reasonable compute resources and achieve a quality close to the human genome. Most of alternative human assemblies are much worse.

I wish we had an aligner that maps sequence not only to the reference genome, but to a collection of genomes. We have made progress, but there is still a long way to go.

Like Jarretinha, I wish we had a flexible software suite that estimates population parameters. I am thinking a sort of combination of MrBayes and ms instead of a library.

As to web development, my impression is that there are too many alternative ways which are all good but all make tradeoff. Different people have different preferences. Just like the versatility of programming languages, the versatility of web development will long live.

Ultimately I wish we could standardize each subfield to avoid countless choices (one or two programming languages, genome browsers, ontologies, mappers, assemblers, ...). This may happen and have already happened in a few subfields, but is never going to happen in all.

ADD COMMENTlink written 7.9 years ago by lh331k
2
gravatar for Mary
7.7 years ago by
Mary11k
Boston MA area
Mary11k wrote:

Here's one I loved--not as it was, I'd want to modify it a bit. But this is the one I would love to be using: Caleydo. http://www.icg.tugraz.at/project/caleydo/
I talked about it here in a bit more detail: http://blog.openhelix.eu/?p=3578.

But I like it because it could conceivably put the 5 tools I need to visualize a lot of what I'm doing up in the 3D box. But I want to be able to pick the 5 tools I'm looking at: say UCSC on the box floor, dbSNP on one side, Reactome on another, expression data in another...etc. And if I could link them together with lines for my train of thought...that would solve the integration that I need for my level of exploration in many cases.

Want.

ADD COMMENTlink written 7.7 years ago by Mary11k
2
gravatar for Andrewjgrimm
7.7 years ago by
Andrewjgrimm440
Sydney, Australia
Andrewjgrimm440 wrote:

Automating grant application writing.

What's the difference between a grant application and a spam email anyway? Both are asking for money, and they both have rejection rates.

ADD COMMENTlink written 7.7 years ago by Andrewjgrimm440
2
gravatar for klemen
4.0 years ago by
klemen160
Slovenia
klemen160 wrote:

HI all, 

 

Since last post in this thread is almost 4 years old, I am just curious. What was already sold, what has changed and more important, Which Application Is Truly Missing In Bioinformatics today? 

 

One of the things I see is still the need for some data format standards. Another one is related to lack of global standards how to build data analysis pipelines. 

 

I am curious about your thoughts. 

ADD COMMENTlink written 4.0 years ago by klemen160
1
gravatar for Yannick Wurm
8.7 years ago by
Yannick Wurm2.3k
Queen Mary University London
Yannick Wurm2.3k wrote:

I want to be able to orally describe what I want to visualize and it to just happen without me having to think about how to do it.

ADD COMMENTlink written 8.7 years ago by Yannick Wurm2.3k
1

Okay, this one is just kind of hilarious.

ADD REPLYlink written 7.9 years ago by Madelaine Gogol5.0k

Why not a slot/controller in your brain with a direct connection with the machine? No need to talk !!! But, how this could be useful for bionformatics?

ADD REPLYlink written 8.7 years ago by Jarretinha3.3k

because a lot of time in bioinformatics is spent massaging data to get results and visualization...

ADD REPLYlink written 8.7 years ago by Yannick Wurm2.3k

If only someone would make this cartoon just like that conversation with a biostatistician... http://blog.openhelix.eu/?p=5355

ADD REPLYlink written 7.7 years ago by Mary11k

@Mary: you can by typing the text at http://www.xtranormal.com/makemovies/

ADD REPLYlink written 7.7 years ago by Andra Waagmeester3.2k
1
gravatar for Egon Willighagen
7.9 years ago by
Maastricht
Egon Willighagen5.2k wrote:

The killer bioinformatics application I find missing is a sequence aligner that operates in O(1/N).

ADD COMMENTlink written 7.9 years ago by Egon Willighagen5.2k

Not even a quantum computer could accomplish that. Maybe a non-deterministic Turing machine or an oracle ...

ADD REPLYlink written 7.9 years ago by Jarretinha3.3k
0
gravatar for Yuri
8.7 years ago by
Yuri1.5k
Bethesda, MD
Yuri1.5k wrote:

I'd like Bioconductor-like repository would exist for MATLAB. Probably not gonna happen. :(

ADD COMMENTlink written 8.7 years ago by Yuri1.5k
0
gravatar for rkostadi
2.0 years ago by
rkostadi60
rkostadi60 wrote:

I am looking for one that gives me the answer to life, the universe and everything. I guess something like echo "42". Killer apps need killer problems to solve.

ADD COMMENTlink written 2.0 years ago by rkostadi60
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2038 users visited in the last hour