Question

Forum:The push toward standardisation

3

Entering edit mode

6.4 years ago

Kevin Blighe 87k

Good evening,

I am keen to improve the level of standardisation in bioinformatics, as it feels that this is one of the most fundamental barriers to progress that we face / have faced. Other 'industries' or work sectors faced similar barriers in the past when it became apparent that everyone was doing their own thing and had their own idea of how something should be performed. The first example that comes to mind was the formation of the IEEE who eventually set standards that have resulted in the huge rise in global communication systems. In medicine, there are medical boards and regulatory bodies that oversee treatment of patients, and there exist strict ethical codes that must be followed.

...in bioinformatics, what have we got?

Tomorrow, a new program may appear from some part of the world that seems to convincingly process RNA-seq and ChIP-seq data together in order to find statistically significant ncRNA transcripts that regulate the binding of my transcription factor of interest. The developers may put a strong case for everyone to use their program. A month after its release, what happens? - 20 bugs are already identified and the developers have to release a new version, which is itself erroneous.

What if there was a body that rigorously screened these programs prior to their release (a requirement for publication)?
What if there was a 'best practices' guide for all types of bioinformatics analyses? - pipelines rigorously tested and proven to function.

I've only been here on Biostars for 2 months but I've been analysing all types of data for many years and one of the main battles that I face on a daily basis arises from the lack of interoperability of data-types and navigating through bugs in programs. Additionally, each time I look, there seems to be a new publication about some program that may never even be used.

I've been fairly impressed by the experience and skills of people here and I believe that we represent the best chance in the World at forming such a international body. I believe that things like Biostars Handbook can help but we have to go further than that and represent ourselves at international conferences that can attract people from various sectors who will actually listen to us. People have a genuine interest in bioinformatics but I feel that we let ourselves down through some of the issues that I've already outlined in this message.

I'll listen to whatever people have to say in response to this and then gauge my next action. I would eventually hope to set-up a conference in Europe, initially, which would represent the first annual meeting of Biostars where we decide what is needed going forward. Luckily I have close colleagues whose sole business is in setting up conferences and bringing people together. I notice that the entire Globe is fairly well represented here, and I personally have close contacts that spread across North and South America, and all across Europe.

Kevin

stadardisation • 1.9k views

ADD COMMENT • link updated 11 months ago by Ram 43k • written 6.4 years ago by Kevin Blighe 87k

score 4 · Accepted Answer · 2017-11-07

4

Entering edit mode

6.4 years ago

GenoMax 141k

While standardization is good for certain things (e.g. diagnostic/testing software) trying to extend that to all areas of bioinformatics may stifle innovation. Can you imagine being that student/post-doc (and you have been there) with a new method that you worked on for months, failing some standard test or being judged deficient in terms of best practices and getting rejected by a publisher.

We are all helping with standardization of methods/software in a way. By using these programs/reporting bugs and helping the programs become better over time. BWA, DESeq2, DeepTools, STAR, BBMap are all examples of this phenomenon.

ADD COMMENT • link 6.4 years ago by GenoMax 141k

0

Entering edit mode

Yes, this website does a lot to help others understand what's commonly used for something, what's outdated, etc. Prior to joining, I always just went to Google for solutions and invariably found a recommended solution (untested or tested) on Biostars, StackExchange, or something else.

The part on innovation that you mention is interesting. There is a lot of competition out there right now, with seemingly every PhD student wanting to develop their own program and get it into Bioconductor, but then what appears to happen is that they move on and don't take responsibility for maintenance.

ADD REPLY • link 6.4 years ago by Kevin Blighe 87k

1

Entering edit mode

There is no good solution for code that goes stale. Unless your current employer uses and/or allows you to maintain code that you own (BBMap suite is one example) there is no financial incentive for a person do so (availability of spare resources and time may be primary factors as well).

ADD REPLY • link 6.4 years ago by GenoMax 141k

score 3 · Accepted Answer · 2017-11-08

3

Entering edit mode

6.4 years ago

Charles Plessy ★ 2.9k

I am not sure to understand that you are looking for, but you might find inspiration in large-scale software distributions like Debian (which is obviously more than just that), where we have high aims on software freedom, community standards, and software quality. Once in Debian, a software is regularly assessed for reproducibility, against regressions etc.

ADD COMMENT • link 6.4 years ago by Charles Plessy ★ 2.9k

0

Entering edit mode

Thanks Charles. I admit to not having used Debian. I currently use Ubuntu (14.04).

My idea was essentially for something akin to the Bioconductor project, but that covers all of bioinformatics. It sounds like debian is already getting there?

What about 'clinical utility', though? As bioinformatics begins to enter clinical practice, we simply cannot afford to continue with programs that contain bugs.

As an example, if we call variants in our clinical laboratory but it's found 1 year later that the variant caller has a bug meaning that it always misses a variant if it falls within the first 25% of read bases for >50% of aligned reads over a position. It would mean that all samples during the year would have to be re-analysed and then clinical reports re-issued (where new variants were found). This would cripple a laboratory. Increasing heavy reliance on NGS in such laboratories can mean that these scenarios will occur.

ADD REPLY • link 6.4 years ago by Kevin Blighe 87k

2

Entering edit mode

If that scenario arises, your clinical lab isn't doing proper validation when vetting their pipeline. The lab should have a baseline stringency that they ensure their pipeline meets before it's ever actually utilized in the treatment of patients. Tools invariably have bugs - even those that are heavily maintained and used in production. Particularly in bioinformatics when new edge cases are constantly being introduced, as it's, well, research. There's always another question to answer, an unconventional way to investigate the data, etc. Part of this is why new tools pop up so often and why old ones get discarded. Some tools adapt and are maintained to match changes in the field or new methodologies; others don't and die off.

I don't disagree with the idea of code review being a necessity for publication (black-box programs and packages are far too common). Bare minimal documentation of the source code should be required, in my opinion. Standardization is an odd way of phrasing that though, it's more just best practices in my eyes. Hopefully as bioinformatics becomes a more and more necessary tool for the average researcher, education will improve and standards will rise. I would love if programs stuck to the standard data formats rather than creating their own unnecessary format though though.

ADD REPLY • link 6.4 years ago by jared.andrews07 ★ 16k

0

Entering edit mode

Very well said Jared - thanks.

If that scenario arises, your clinical lab isn't doing proper validation when vetting their pipeline.

Yes, you got me there!

ADD REPLY • link 6.4 years ago by Kevin Blighe 87k

1

Entering edit mode

Re-running a workflow (pipeline) after changing a single component is difficult when using pre-packaged software. For instance, a package with just the bug solved may not exist (instead, there may be an upgraded version where the bug was changed and in addition some algorithm modified).

Perhaps what you are looking for(wards) is a meta-distribution of standard pipelines with standard test data, with continuous integration and robust regression testing ?

ADD REPLY • link 6.4 years ago by Charles Plessy ★ 2.9k

0

Entering edit mode

Yes, something along those lines. As Jared mentions, it's more about coming up with many 'best practice' guidelines, akin to what The Broad Institute has for analysing DNA-seq data, although theirs is obviously heavily biased toward using the GATK, which is just one of many variant callers.

I think that these thoughts are going to sit for a while and then I'll shelve them as I get too busy in the day job again..!

ADD REPLY • link 6.4 years ago by Kevin Blighe 87k