Question: We Have The Minimum Of Everything Required For Bioinformatics Analysis; Why Do We Need More?
5
gravatar for jobinv
7.3 years ago by
jobinv1.1k
Bergen, Norway
jobinv1.1k wrote:

Our research facility is heading towards using more exome sequencing and RNA-sequencing in our work, i.e. stepping up to more data-intensive areas as compared to our previous focus on microarray data and primarily wet lab research. We would not be doing the sequencing itself (we would use commercial partners for this), but the analysis and interpretation would be on our table.

However, while this is the ambition, my superiors are a bit new to this area themselves, and potentially do not have a full understanding of the demands of such an ambition; for one, on the analysis side, I am the only person working with the bioinformatics in our group, and even I have quite limited experience within the field, mainly learning as I go. This is however not all too problematic; the volume that we produce is not so very high, and I am able to get by by consulting more experienced bioinformaticians, including of course the Biostar community.

Secondly, in terms of computational power, we have a single, humble machine for the computational work (quad core 3.30 GHz with 64 GB RAM). This also seems to be sufficient for the work that we are doing; after all, I have been able to perform complete pipelines of exome sequencing analysis, RNA-Seq analysis and microarray analysis on this computer.

Thirdly, in terms of storage capacity, we currently have a 3 TB drive on this computer, which is rapidly filling up. This is quite obviously not enough in the long run, but my supervisor seems to be inclined towards buying new external hard drives as we need them. Based on impressions that I've picked up, I am trying to convince him that it would be much better to have an operational server. However, this would entail that we would need additional staff to be in charge of the server maintenance and regular backups. Hiring additional staff is of course very expensive, and I would need convincing arguments to present this.

Fourthly, in terms of data management, we're currently keeping everything the old-fashioned way, with a bunch of files lying around in a bunch of folders. I would imagine that the ideal situation would be to have our data stored as a queriable database. Admittedly, I have so little knowledge with databases that I can't really make solid arguments for this position, but I do believe that such a setup would facilitate easier access and flexibility, without being able to concretely detail what I mean by that.

My question (we finally get to this) is as follows: in what areas should we really aim to step up our game? Also, what would be convincing arguments to invest money (and effort) into doing that? Keep in mind that I have to convey these arguments to biologists, who are in charge of the big money bag.

• 3.6k views
ADD COMMENTlink modified 7.3 years ago by Istvan Albert ♦♦ 86k • written 7.3 years ago by jobinv1.1k

Follow-up question here: What are the advantages of data management in databases?

ADD REPLYlink written 7.3 years ago by jobinv1.1k
6
gravatar for Istvan Albert
7.3 years ago by
Istvan Albert ♦♦ 86k
University Park, USA
Istvan Albert ♦♦ 86k wrote:

The problems that you face are very common - how to scale up computation without radically increasing the cost.

As you note the salary for additional staff is currently the most substantial cost - one that does not lend itself to gradual increases as the need increases.

The ideal solution would be to outsource your computational needs to a trustworthy third party - of course finding that party is very difficult.

(Personal musings: for what is worth I am considering the possibility of adding to Biostar a "project" section that could be used to both ways to connect people that would need bioinformatics assistance with those that are able to do that. But for that there need to be checks and balances in place for a third party to be able to audit the process.)

As for your problem I do believe that for projects that are at least one order of magnitude smaller than the human genome one can get by with far fewer computational resources, for example it may be surprising for some but I noticed that with good data and optimal coverage one can assemble a bacterial genome even on a Macbook Air.

I think your best option would be getting a larger server that has sufficient RAM and storage for your lab in a configuration that would not necessarily need separate maintenance. For example you can get a tower workstation at http://www.penguincomputing.com/ with 30TB storage, 32 CPU cores and 196GB RAM for around $15K - a system that based on your use cases would most likely serve your needs for many years to come.

ADD COMMENTlink modified 7.3 years ago • written 7.3 years ago by Istvan Albert ♦♦ 86k

Thank you for the tip. I will look into that site, they look like quite sturdy machines indeed.

ADD REPLYlink written 7.3 years ago by jobinv1.1k

-Personal musings: for what is worth I am considering the possibility of adding to Biostar a "project" section that could be used to both ways to connect people that would need bioinformatics assistance with those that are able to do that. But for that there need to be checks and balances in place for a third party to be able to audit the process.

This is a terrific idea Albert! I sometimes play matchmaker between small wet labs and small computational labs. I'm happy to do it but I'm not very efficient! =)

ADD REPLYlink written 7.3 years ago by Michele Busby2.1k

Thanks, I am certain that it would be of great interest to many - but it is an idea where the execution/implementation is very important. We'll need a lot of feedback and evaluation on the implementation once we get going.

ADD REPLYlink written 7.3 years ago by Istvan Albert ♦♦ 86k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 975 users visited in the last hour