Question

We Have The Minimum Of Everything Required For Bioinformatics Analysis; Why Do We Need More?

5

Entering edit mode

10.5 years ago

jobinv ★ 1.1k

Our research facility is heading towards using more exome sequencing and RNA-sequencing in our work, i.e. stepping up to more data-intensive areas as compared to our previous focus on microarray data and primarily wet lab research. We would not be doing the sequencing itself (we would use commercial partners for this), but the analysis and interpretation would be on our table.

However, while this is the ambition, my superiors are a bit new to this area themselves, and potentially do not have a full understanding of the demands of such an ambition; for one, on the analysis side, I am the only person working with the bioinformatics in our group, and even I have quite limited experience within the field, mainly learning as I go. This is however not all too problematic; the volume that we produce is not so very high, and I am able to get by by consulting more experienced bioinformaticians, including of course the Biostar community.

Secondly, in terms of computational power, we have a single, humble machine for the computational work (quad core 3.30 GHz with 64 GB RAM). This also seems to be sufficient for the work that we are doing; after all, I have been able to perform complete pipelines of exome sequencing analysis, RNA-Seq analysis and microarray analysis on this computer.

Thirdly, in terms of storage capacity, we currently have a 3 TB drive on this computer, which is rapidly filling up. This is quite obviously not enough in the long run, but my supervisor seems to be inclined towards buying new external hard drives as we need them. Based on impressions that I've picked up, I am trying to convince him that it would be much better to have an operational server. However, this would entail that we would need additional staff to be in charge of the server maintenance and regular backups. Hiring additional staff is of course very expensive, and I would need convincing arguments to present this.

Fourthly, in terms of data management, we're currently keeping everything the old-fashioned way, with a bunch of files lying around in a bunch of folders. I would imagine that the ideal situation would be to have our data stored as a queriable database. Admittedly, I have so little knowledge with databases that I can't really make solid arguments for this position, but I do believe that such a setup would facilitate easier access and flexibility, without being able to concretely detail what I mean by that.

My question (we finally get to this) is as follows: in what areas should we really aim to step up our game? Also, what would be convincing arguments to invest money (and effort) into doing that? Keep in mind that I have to convey these arguments to biologists, who are in charge of the big money bag.

• 5.3k views

ADD COMMENT • link updated 10.5 years ago by Istvan Albert 100k • written 10.5 years ago by jobinv ★ 1.1k

0

Entering edit mode

Follow-up question here: What are the advantages of data management in databases?

ADD REPLY • link 10.5 years ago by jobinv ★ 1.1k

score 6 · Answer 1 · 2013-10-12

The problems that you face are very common - how to scale up computation without radically increasing the cost.

As you note the salary for additional staff is currently the most substantial cost - one that does not lend itself to gradual increases as the need increases.

The ideal solution would be to outsource your computational needs to a trustworthy third party - of course finding that party is very difficult.

(Personal musings: for what is worth I am considering the possibility of adding to Biostar a "project" section that could be used to both ways to connect people that would need bioinformatics assistance with those that are able to do that. But for that there need to be checks and balances in place for a third party to be able to audit the process.)

As for your problem I do believe that for projects that are at least one order of magnitude smaller than the human genome one can get by with far fewer computational resources, for example it may be surprising for some but I noticed that with good data and optimal coverage one can assemble a bacterial genome even on a Macbook Air.

I think your best option would be getting a larger server that has sufficient RAM and storage for your lab in a configuration that would not necessarily need separate maintenance. For example you can get a tower workstation at http://www.penguincomputing.com/ with 30TB storage, 32 CPU cores and 196GB RAM for around $15K - a system that based on your use cases would most likely serve your needs for many years to come.