Forum:What Is The Proper Way To Think About Reinventing The Wheel As A Bioinformatician?
5
15
Entering edit mode
11.2 years ago
KCC ★ 4.1k

I suspect I am not the only one who reinvents the wheel, since there are so many tools that have overlapping functionality. In my case, I found that I was concerned with the time it took to convert between different genomic data formats and it felt easier to write my own. These converters would never stand up to use with everybody's data, but for my data and my hardware, they work well. Also, because I wrote them, I know exactly how they work. I find it's often easier to just write my own, than learn somebody else's system. Thus, I often have to overcome my natural tendency to ignore prior work.

So, what is the proper way to think about reinventing the wheel as a bioinformatician?

I understand that there is no rule that could fit everybody. However, I was wondering if more experienced folks can walk me through their thought processes concerning writing tools, that they know to be duplicative in some way.

software-development • 4.8k views
ADD COMMENT
11
Entering edit mode
11.2 years ago

The best reason, personally, for re-inventing the wheel is that you learn the nitty gritty details of how something works.

ADD COMMENT
5
Entering edit mode

Exactly what I was going to say. If you want to use a method, take something off the shelf. If you want to fully understand and/or improve upon a method, you have to dissect it. My dad once told me he would buy me a new stereo if I could tell him how one works. I never got the new stereo, but the concept stuck with me.

ADD REPLY
8
Entering edit mode
11.2 years ago

I like those posts about reinventing the wheel;

ADD COMMENT
6
Entering edit mode
11.2 years ago
SES 8.6k

There are several valid reasons I can think of to develop a tool that already exists.

1) Software has a limited lifespan. A lot of bioinformatics software is developed by grad students or postdocs who leave their job after a few years and go on to something else. Sometimes the project moves with them and is maintained, but many times the code is left to rot on a server in their old lab. On many occasion I've found tools that were designed to do exactly what I wanted but the tool was developed 10 years prior, so it is often difficult or impossible to compile and almost guaranteed to break if it does any file parsing (except maybe fasta).

2) Performance. Even if you find some old code that works, chances are that it was not designed with the scale of modern data sets in mind and it may be too slow, if it works at all.

3) Implementing new methodology. Computing technology and technology in the life sciences moves so fast it's difficult to think of tools that are more than 10 years old and are still relevant. There are definitely some out there, but quite often you can benefit from rethinking the problem with your data and resources in mind and come up with a better solution. You do not have to think very hard to manipulate a file of 10 sequences, but the situation is different if you have millions.

4) Science is competitive. People develop protocols in parallel all the time, always have. It is good to be efficient and not waste time on something that has been solved, but you also don't want to spend days trying to make some awkward script work on your massive HiSeq data set. There are always deadlines you are facing as well, so sometimes writing the code is easier than trying to understand someone else's code.

5) It is easier. For simple tasks, it may be much faster and easier to just write your own method rather than depending on a large library like BioPerl to always be there, especially if you are going to be moving your code around.

ADD COMMENT
5
Entering edit mode
11.2 years ago
Fwip ▴ 500

My personal rule: Don't work too hard, and leverage existing tools and libraries. If I can write 10 lines of shell script or 30 lines of (Bio)perl, then I think that's fine.

But if it's ballooning to 200+ lines, that's almost always a sign that I'm putting too much effort in, and reimplementing an existing tool poorly.

ADD COMMENT
0
Entering edit mode

Agreed: always a balancing act between finding&evaluating a tool and writing it yourself.

ADD REPLY
2
Entering edit mode
11.2 years ago
William ★ 5.3k

Reinventing the wheel is often a sign of one of the deadly sins of bioinformtics:

1) Parochialism and insularity

2) Exceptionalism

3) Autonomy or death!

4) Vanity: Pride and Narcissism

5) Monolith Meglomania

6) Scientific method Sloth

7) Instant Gratification

See the Seven deadly sins of bioinformatics presentation by Carole Goble. http://www.slideshare.net/dullhunk/the-seven-deadly-sins-of-bioinformatics

I try to avoid reinventing the wheel as much as possible, unless I am sure I that I can invent a better or new wheel.

ADD COMMENT
0
Entering edit mode

Quite interesting slides. I also recommend. I feel it should be clarified that OP mainly focuses on reinventing simple routines such as format conversion which can be done in a few hours, while these slides more focus on reinvention in a larger scale, such as reinventing a database which takes a lot more man-days.

ADD REPLY

Login before adding your answer.

Traffic: 2028 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6