Question

Blue Collar Bioinformatics ... What Are The Boring / Monotonous / Day To Day Tasks You'D Like "Solved"

3

Entering edit mode

12.4 years ago

Delinquentme ▴ 200

title covers it pretty well...

Being that there are a number of programmers here I'm wondering if this is something of a "problem solved"

or are there numerous tasks which are slowing things down, specifically tasks which could be automated

Is there any chance that these could be outsourced?... perhaps something sufficiently simple that it could be performed by someone without a bioinformatics background

and you know, freeing up your time?

career • 5.4k views

ADD COMMENT • link updated 12.4 years ago by Pierre Lindenbaum 161k • written 12.4 years ago by Delinquentme ▴ 200

2

Entering edit mode

Could you make this question more precise? Most jobs involve some degree of boring, monotonous day-to-day work which is unavoidable, rather than a problem in need of solution. Perhaps we could focus on problems which could be solved faster by technological solutions, rather than a vague notion of "I am above this boring work."

ADD REPLY • link 12.4 years ago by Neilfws 49k

score 7 · Answer 1 · 2011-11-24

It may well be that there are common bottlenecks in our analysis pipelines, which could be improved using better code, better working practice or better education. Having said that...

...80-90% of my time is spent gathering data, cleaning it up, getting it into a state where it can be analysed and learning how to use the tools to do the analysis.

So I'd argue that most bioinformatics (and indeed many other jobs) consists of boring, monotonous day-to-day tasks. It is not a problem to be solved: it's just "how things are."

To whom would you outsource such work? Companies specialising in boredom? People who don't mind being bored? Who would pay for that? It's quicker, easier and more practical simply to get on with it yourself.

J.C.R. Licklider figured this out over 50 years ago in Man-Computer Symbiosis. He analysed the time that he spent working on problems and wrote:

It soon became apparent that the main thing I did was to keep records, and the project would have become an infinite regress if the keeping of records had been carried through in the detail envisaged in the initial plan. It was not. Nevertheless, I obtained a picture of my activities that gave me pause. Perhaps my spectrum is not typical--I hope it is not, but I fear it is.

About 85 per cent of my "thinking" time was spent getting into a position to think, to make a decision, to learn something I needed to know. Much more time went into finding or obtaining information than into digesting it. Hours went into the plotting of graphs, and other hours into instructing an assistant how to plot. When the graphs were finished, the relations were obvious at once, but the plotting had to be done in order to make them so. At one point, it was necessary to compare six experimental determinations of a function relating speech-intelligibility to speech-to-noise ratio. No two experimenters had used the same definition or measure of speech-to-noise ratio. Several hours of calculating were required to get the data into comparable form. When they were in comparable form, it took only a few seconds to determine what I needed to know.

Throughout the period I examined, in short, my "thinking" time was devoted mainly to activities that were essentially clerical or mechanical: searching, calculating, plotting, transforming, determining the logical or dynamic consequences of a set of assumptions or hypotheses, preparing the way for a decision or an insight. Moreover, my choices of what to attempt and what not to attempt were determined to an embarrassingly great extent by considerations of clerical feasibility, not intellectual capability.

score 3 · Answer 2 · 2011-11-24

3

Entering edit mode

12.4 years ago

Pierre Lindenbaum 161k

are there numerous tasks which are slowing things down.

most unix tools (cut, awk, sort etc...) use the indexes of the columns ($1, $2...) to complete their tasks. I wish they could run on a column name (CHROM, POS... ) and the type of each column (int, double, string, rs, gene, etc...) automatically detected.

EDIT:

someting like:

xsort +CHROM +POS extended.vcf | xcut CHROM POS REF ALT |\
xuniq CHROM POS REF ALT |\
xgrep rs ID |\
xselect '($CHROM=="chr1" and $POS>1)' |\
xsaveas save.hdf5

ADD COMMENT • link 12.4 years ago by Pierre Lindenbaum 161k

0

Entering edit mode

could I get more context to this? .. im guessing these columns are in sequence data?

ADD REPLY • link 12.4 years ago by Delinquentme ▴ 200

0

Entering edit mode

I've updated my answer.

ADD REPLY • link 12.4 years ago by Pierre Lindenbaum 161k

0

Entering edit mode

so this is "variable input" for unix tools to specify which columns inside of data you'd like to....? display ? I guess I'd expect there is a robust sorting tool which would support this kind of thing .. XSLT is the first thing which comes to mind?

ADD REPLY • link 12.4 years ago by Delinquentme ▴ 200

score 2 · Answer 3 · 2011-11-24

2

Entering edit mode

12.4 years ago

Gjain 5.8k

With the advent of new technologies, there is too much data that needs to be annotated. The increasing number of assembled genomes have to be “annotated”: genes have to be identified and marked out, their functions have to be identified, and so on. We need better and more accurate annotation tools.

ADD COMMENT • link 12.4 years ago by Gjain 5.8k

0

Entering edit mode

PLEASE correct me if I'm wrong but annotating is a process of: 1) find a sequence between introns and exons 2) search that against a DB of known genes 3) if a match is found, mark it on that sequence .. and if no match ... move along the sequence to the next intron

this sounds like a task which could be easily mechanized .. but im guessing im missing something?

ADD REPLY • link 12.4 years ago by Delinquentme ▴ 200

0

Entering edit mode

PLEASE correct me if I'm wrong but annotating is a process of: 1) find a sequence between introns and exons 2) search that against a DB of known genes 3) if a match is found, mark it on that sequence .. and if no match ... move along the sequence to the next intron. This sounds like a task which could be easily mechanized but im guessing im missing something?

ADD REPLY • link 12.4 years ago by Delinquentme ▴ 200

0

Entering edit mode

you are right and what you described is the first step of annotation. But there is more to it. After the basic level of annotation that is using BLAST for finding similarities, and then annotating genomes based on that, nowadays more and more additional information is added to the annotation platform. The additional information allows manual annotators to deconvolute discrepancies between genes that are given the same annotation.

ADD REPLY • link 12.4 years ago by Gjain 5.8k

0

Entering edit mode

Some databases use genome context information, similarity scores, experimental data, and integrations of other resources to provide genome annotations through their Subsystems approach. Other databases (e.g. Ensembl) rely on both curated data sources as well as a range of different software tools in their automated genome annotation pipeline. You can get more insight from http://www.ncbi.nlm.nih.gov/pubmed/11433356

ADD REPLY • link 12.4 years ago by Gjain 5.8k

0

Entering edit mode

the "annotation platform" ? And "deconvolute discrepancies between genes that are given the same annotation" .. Meaning you've got 1 gene that has 2 differing sets of notes on it, from 2 databases. Here you want to pick out which is the valid annotation? ... right?

ADD REPLY • link 12.4 years ago by Delinquentme ▴ 200

0

Entering edit mode

yes that is correct.

ADD REPLY • link 12.4 years ago by Gjain 5.8k

0

Entering edit mode

How does one determine which is the correct annotation? And other than the fact this is a human pursuit .. what creates such contradictions?

ADD REPLY • link 12.4 years ago by Delinquentme ▴ 200