Question: Hardware Resources For Hpc In Bioinformatics
9
gravatar for Jarretinha
9.3 years ago by
Jarretinha3.3k
S√£o Paulo, Brazil
Jarretinha3.3k wrote:

Greetings everybody,

We're planning to build a very powerful computing machine to serve bioinformatics application here at HCFMUSP (check my profile). I know that the common choice is to build a cluster or go cloud. But our adventurous spirit urges for some experimentation. We are somewhat envious of proprietary solutions using FPGA cards like these ones:

CLCbio Cube

TimeLogic DeCypher

Pico Computing E-FPGA

For the people who never heard of FPGA I do suggest to check out Wikipedia on these topics:

Field Programmable Gate Array Reconfigurable Computing

There are several possible implementations of important algorithms in bioinformatics in those plataforms. This is just one example:

160-fold acceleration of the Smith-Waterman algorithm using a field programmable gate array (FPGA)

Does anybody have some experience with these cards? Do they scale well? Are they worth the trouble?

Cheers, Daniel

-- Edit --

Finally, my server is online!!! For now it's just some Xeons with lots of RAM. But, in a near future some Tesla/Fermi will be added. Happy !!!

• 3.2k views
ADD COMMENTlink modified 8.3 years ago by Jan Van Haarst300 • written 9.3 years ago by Jarretinha3.3k
1

Hi Jarretinha... It would be good if you could add links. For instance I assume HCFMUSP is in Brazil, but honestly, the abreviation doesn't mean anything to me. Also, lots of people won't know what an FPGA is: you could make that a link to Wikipedia. Same with the different commercial solutions you're suggesting etc.

ADD REPLYlink written 9.3 years ago by Nicojo1.1k

How it sounds now? I can put more references.

ADD REPLYlink written 9.3 years ago by Jarretinha3.3k

Thanks for the links, they are most helpful and also a very interesting question indeed.

ADD REPLYlink written 9.3 years ago by Istvan Albert ♦♦ 80k

Excellent edit! Thanks ;)

ADD REPLYlink written 9.3 years ago by Nicojo1.1k
4
gravatar for Brandstaetter
9.3 years ago by
Brandstaetter270
Austria
Brandstaetter270 wrote:

From what I've read, FPGAs are fine, but expensive. Have you looked into other, more readily available (and easier to program, mostly) architectures? I'm thinking of GPUs (CUDA, OpenCL), and Cell B.E. chips (in Playstation 3, programmable via C/C++ and also via OpenCL)?

ADD COMMENTlink written 9.3 years ago by Brandstaetter270

CUDA are on my whishlist. I've just acquired a SuperServer 6016GT-TF and totally intend to fill it up with NVIDIA Tesla/Fermi . CUDA for Bioinformatics really works. And it is quite affordable.

Cell are nice but hard to program/port. Compilers/libs for this type of architecture aren't in good shape right now. Anyway, the speedup is comparable to that in CUDA. OpenCL too isn't mature enough.

By the way, Xlinx FPGAs possess PowerPC cores which makes them some sort of Cell when properly assembled.

ADD REPLYlink written 9.3 years ago by Jarretinha3.3k
4
gravatar for Jan Van Haarst
9.3 years ago by
Wageningen, NL
Jan Van Haarst300 wrote:

There is a reason that everybody uses clusters/clouds: They are simple to setup, and very flexible.

FPGA can be faster, but a cluster of CPU's with an optimized program will scale better and will be faster in the end.

GPGPU is rather limited to a certain type of application : do the same calculation on a lot of data, and that dat should be small.

I would put my money in a cluster/cloud, rather than invest in FPGA's and GPU's

ADD COMMENTlink written 9.3 years ago by Jan Van Haarst300

A friend of mine (who deploys clusters) said the same thing. But I saw CLCbio cube in action in a large dataset and got very impressed. And we have some issues about energy consumption (watt per flop). That's why I'm looking for people with some experience with these things. By the way, the cloud isn't mature enough to deliver the same performance as in-house cluster. Check out this example - http://www.genomeweb.com/sites/default/files/walker.pdf

ADD REPLYlink written 9.3 years ago by Jarretinha3.3k

The example you pointed too is about a type of HPC computing that is largely irrelevant to bioinformatics, sharing a lot of data between processes. What we usually see in sequence analysis is another way of parallelisation, do the same type of analysis on a lot of data. A good example of cloud based computing is Cloudburst ; http://sourceforge.net/apps/mediawiki/cloudburst-bio/index.php?title=CloudBurst

ADD REPLYlink written 9.3 years ago by Jan Van Haarst300

Bioinformatics don't stop at genome assembly or microarray analysis. Most of what I do depends on sharing a lot of data. Certain types of aligment, too. I know that CFD is way too different, but many phylogenetics tasks lie in EP category. Most population genetics tasks lie in the EP/FT category. And many systems biology lie in CG. Your example is just the entry point of the bioinformatics pipeline. Just try to compute the unrooted phylogeny of all Archaea using complete genomes and you'll see the problem.

ADD REPLYlink written 9.3 years ago by Jarretinha3.3k

I would go as far as to say that outside of assembly, very few bioinformatics processes require the kind of tight coupling that can't be addressed by smart distributed computing. FPGAs in particular make the work/$ equation relatively unattractive (both from cost of hardware and cost of development). GPUs on the other hand, while not suitable for all problems, especially many bioinformatics problems, do change the economics a bit.

ADD REPLYlink written 9.3 years ago by Mndoci1.2k
2
gravatar for Darked89
9.3 years ago by
Darked894.2k
Barcelona, Spain
Darked894.2k wrote:

All depends how diverse will be the applications running on this beast. If the end users are from DNA sequencing, NMR, mass spec to crystallography and the total number o applications is say 50+ it is unlikely you will be able to support it not even on FPGAs but even with CUDAs. Either something installs / compiles (almost) out of the box or you may have to drop it. Software authors will be of no help when it comes to porting it (and possibly a bunch of libs they depend on) to a new platform they do not even have in house.

On the other hand whenever problem is restricted to one domain, FPGAs are great. I used SORCERER for protein mass spec and DeCypher for blast searches.

Anyway, have fun with new servers, whatever they will be :-)

ADD COMMENTlink modified 9.3 years ago • written 9.3 years ago by Darked894.2k

Good point. The applications are the computer ... ;-)

ADD REPLYlink written 9.3 years ago by Istvan Albert ♦♦ 80k

Most people here deal with sequence data and microarrays. So, the basic idea is to use FPGAs to sequence data (higher demand) and CUDAs to microarrays and related. Molecular dynamics and related stuff rely on another cluster. [?] Here, we will develop solutions on FPGAs (the utmost dream is a FPGA card able to perform Burrows-Wheeler transform based alignments). WE are the tinkerers . . . [?] Anyway, some people might want a proprietary solution. It's good to know they are worth the trouble, though.

ADD REPLYlink written 9.3 years ago by Jarretinha3.3k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 818 users visited in the last hour