Question

Hardware Resources For Hpc In Bioinformatics

9

Entering edit mode

14.1 years ago

Jarretinha 3.4k

Greetings everybody,

We're planning to build a very powerful computing machine to serve bioinformatics application here at HCFMUSP (check my profile). I know that the common choice is to build a cluster or go cloud. But our adventurous spirit urges for some experimentation. We are somewhat envious of proprietary solutions using FPGA cards like these ones:

For the people who never heard of FPGA I do suggest to check out Wikipedia on these topics:

There are several possible implementations of important algorithms in bioinformatics in those platforms. This is just one example:

160-fold acceleration of the Smith-Waterman algorithm using a field programmable gate array (FPGA)

Does anybody have some experience with these cards? Do they scale well? Are they worth the trouble?

Cheers,
Daniel

-- Edit --

Finally, my server is online!!! For now it's just some Xeons with lots of RAM. But, in a near future some Tesla/Fermi will be added. Happy !!!

hpc • 5.0k views

ADD COMMENT • link updated 5 months ago by Ram 43k • written 14.1 years ago by Jarretinha 3.4k

1

Entering edit mode

Hi Jarretinha... It would be good if you could add links. For instance I assume HCFMUSP is in Brazil, but honestly, the abreviation doesn't mean anything to me. Also, lots of people won't know what an FPGA is: you could make that a link to Wikipedia. Same with the different commercial solutions you're suggesting etc.

ADD REPLY • link 14.1 years ago by Nicojo ★ 1.1k

0

Entering edit mode

How it sounds now? I can put more references.

ADD REPLY • link 14.1 years ago by Jarretinha 3.4k

0

Entering edit mode

Thanks for the links, they are most helpful and also a very interesting question indeed.

ADD REPLY • link 14.1 years ago by Istvan Albert 100k

0

Entering edit mode

Excellent edit! Thanks ;)

ADD REPLY • link 14.1 years ago by Nicojo ★ 1.1k

score 4 · Answer 1 · 2010-03-17

4

Entering edit mode

14.1 years ago

Brandstaetter ▴ 270

From what I've read, FPGAs are fine, but expensive. Have you looked into other, more readily available (and easier to program, mostly) architectures? I'm thinking of GPUs (CUDA, OpenCL), and Cell B.E. chips (in Playstation 3, programmable via C/C++ and also via OpenCL)?

ADD COMMENT • link 14.1 years ago by Brandstaetter ▴ 270

0

Entering edit mode

CUDA are on my whishlist. I've just acquired a SuperServer 6016GT-TF and totally intend to fill it up with NVIDIA Tesla/Fermi . CUDA for Bioinformatics really works. And it is quite affordable.

Cell are nice but hard to program/port. Compilers/libs for this type of architecture aren't in good shape right now. Anyway, the speedup is comparable to that in CUDA. OpenCL too isn't mature enough.

By the way, Xlinx FPGAs possess PowerPC cores which makes them some sort of Cell when properly assembled.

ADD REPLY • link 14.1 years ago by Jarretinha 3.4k

Ram · Answer 2 · 2010-03-26

4

Entering edit mode

14.1 years ago

Jan Van Haarst ▴ 300

There is a reason that everybody uses clusters/clouds: They are simple to setup, and very flexible.

FPGA can be faster, but a cluster of CPU's with an optimized program will scale better and will be faster in the end.

GPGPU is rather limited to a certain type of application : do the same calculation on a lot of data, and that dat should be small.

I would put my money in a cluster/cloud, rather than invest in FPGA's and GPU's

ADD COMMENT • link 14.1 years ago by Jan Van Haarst ▴ 300

0

Entering edit mode

A friend of mine (who deploys clusters) said the same thing. But I saw CLCbio cube in action in a large dataset and got very impressed. And we have some issues about energy consumption (watt per flop). That's why I'm looking for people with some experience with these things. By the way, the cloud isn't mature enough to deliver the same performance as in-house cluster. Check out this example - http://www.genomeweb.com/sites/default/files/walker.pdf

ADD REPLY • link 14.1 years ago by Jarretinha 3.4k

0

Entering edit mode

The example you pointed to is about a type of HPC computing that is largely irrelevant to bioinformatics, sharing a lot of data between processes.

What we usually see in sequence analysis is another way of parallelisation, do the same type of analysis on a lot of data.

A good example of cloud based computing is Cloudburst.

ADD REPLY • link updated 4.7 years ago by Ram 43k • written 14.1 years ago by Jan Van Haarst ▴ 300

0

Entering edit mode

Bioinformatics don't stop at genome assembly or microarray analysis. Most of what I do depends on sharing a lot of data. Certain types of aligment, too. I know that CFD is way too different, but many phylogenetics tasks lie in EP category. Most population genetics tasks lie in the EP/FT category. And many systems biology lie in CG. Your example is just the entry point of the bioinformatics pipeline. Just try to compute the unrooted phylogeny of all Archaea using complete genomes and you'll see the problem.

ADD REPLY • link 14.1 years ago by Jarretinha 3.4k

0

Entering edit mode

I would go as far as to say that outside of assembly, very few bioinformatics processes require the kind of tight coupling that can't be addressed by smart distributed computing. FPGAs in particular make the work/$ equation relatively unattractive (both from cost of hardware and cost of development). GPUs on the other hand, while not suitable for all problems, especially many bioinformatics problems, do change the economics a bit.

ADD REPLY • link 14.1 years ago by Mndoci ★ 1.2k

score 2 · Answer 3 · 2010-03-17

2

Entering edit mode

14.1 years ago

Darked89 4.6k

All depends how diverse will be the applications running on this beast. If the end users are from DNA sequencing, NMR, mass spec to crystallography and the total number o applications is say 50+ it is unlikely you will be able to support it not even on FPGAs but even with CUDAs. Either something installs / compiles (almost) out of the box or you may have to drop it. Software authors will be of no help when it comes to porting it (and possibly a bunch of libs they depend on) to a new platform they do not even have in house.

On the other hand whenever problem is restricted to one domain, FPGAs are great. I used SORCERER for protein mass spec and DeCypher for blast searches.

Anyway, have fun with new servers, whatever they will be :-)

ADD COMMENT • link 14.1 years ago by Darked89 4.6k

0

Entering edit mode

Good point. The applications are the computer ... ;-)

ADD REPLY • link 14.1 years ago by Istvan Albert 100k

0

Entering edit mode

Most people here deal with sequence data and microarrays. So, the basic idea is to use FPGAs to sequence data (higher demand) and CUDAs to microarrays and related. Molecular dynamics and related stuff rely on another cluster. [?] Here, we will develop solutions on FPGAs (the utmost dream is a FPGA card able to perform Burrows-Wheeler transform based alignments). WE are the tinkerers . . . [?] Anyway, some people might want a proprietary solution. It's good to know they are worth the trouble, though.

ADD REPLY • link 14.1 years ago by Jarretinha 3.4k