Cuda/Gpu Processing And Bioinformatics
5
10
Entering edit mode
13.4 years ago
Matt ▴ 110

Does anybody see any progress on the horizon for leveraging GPU processing against computationally challenging problems-- e.g. de novo assembly.. I know that there are a few threads (older ones) over at seqanswers, and MUMmerGPU is one option for alignment...

assembly hardware • 9.2k views
ADD COMMENT
16
Entering edit mode
13.4 years ago
Mrawlins ▴ 430

Most of the difficult problems in biology are readily reduced to graph or set problems. They're hard not because these problems have difficult math but because they are on large datasets with lots of inter-dependencies. Your example of de-novo assembly is a good one. Doing assembly on 300 million reads doesn't require any difficult computing, just an awful lot of it. In the worst case it has to compare every read with every other read, which would take years at least. Everything I've seen GPU processing used on is floating-point calculations. Things like protein folding or molecular orbital energy require a lot of complex math, which is why they get such a boost out of using GPUs that are designed to do complex math more quickly than the CPUs.

Most everything I've done in bioinformatics has been memory-limited. GPU processing won't help with that. For the many problems in bioinformatics where memory is the limiting factor, GPU processing won't be worth the effort or the money. I don't see that changing any time in the next few years.

ADD COMMENT
2
Entering edit mode

I'll add that there are lots of processes (like short-read mapping) that are disk I/O limited, especially on multi-core machines. GPUs don't help with that either.

ADD REPLY
1
Entering edit mode

i totally agree - it is the quantity of data in 'routine' cases that is the problem, not the nature of the computation

ADD REPLY
3
Entering edit mode
13.4 years ago
lh3 33k

It would be good to study GPU based algorithms as a research project, but I do not see these GPU based algorithms are of much practical uses in sequence analyses in the near future. Here are the reasons:

  1. GPU is costly. To use GPU-based algorithms, you have to put one or two graphical cards at each node in a compute farm. However, only a few programs powered by GPU can benefit from that. If you buy more CPUs using the money for GPUs, all programs will benefit.

  2. GPU is not fast enough even for the same algorithm implemented in different ways. GPU-powered programs are fast, but they are only a few times faster than the best alternative. For example, with one GeForce 8800, GPU-SW (Manavski and Valle, 2007) is about the same speed as SSE2-SW (Farrar, 2006). MummerGPU is only 3X faster than Mummer. I would guess accelerating Mummer with SSE2 (if at all possible) may deliver a similar performance.

  3. Improvements to algorithms are frequently more effective. Still take Smith-Waterman alignment (SW) as the example. The initial version of BWT-SW does not use GPU or SSE2, but it is hundreds times faster than GPU-SW (thousands than the standard implementation). HMMER3 is another example. Although the use of SSE2 is one of the major boosts, improvements to the underlying algorithms also plays an important role. Also, as you said, MummerGPU is an option, but there are much better programs for NGS applications.

Some people are optimistic about GPU-based algorithms because they think a GPU has much more "cores" than a CPU. But in practice, we can hardly get the theoretical performance due to I/O and the restriction of algorithms. Studying GPU based algorithms is more of research interest than of practical use. I see SSE2 is much more promising.

EDIT: I am not qualified to predict the long-term trend of GPU computing. The trend will depend on the evolution of GPUs and many-core CPUs, but I have little idea about that. I can only predict that in a year or two, GPU will be of little practical use in sequence analyses.

EDIT 20101203: This link could be interesting to someone, although the use of DFS worry me a little: using DFS is faster but less accurate.

EDIT 20110110: GPU blast is published. 3-4 fold speedup is gained reportedly. I believe SSE2-powered blast would be faster. Still no sign of GPU computing gaining the ground in sequence analyses.

ADD COMMENT
3
Entering edit mode

I'm sorry, but I totally disagree. GPU is one of the cheapest methods if you look at the price per GFLOP. It all depends on how well your algorithm is parallelizable in theory and how well this has been done in a certain application. GPU computing can be tremendously powerful, but for most biological application it isn't just yet (for that they will need more memory and maybe some changes in architecture).

ADD REPLY
1
Entering edit mode

Perhaps you are right given 5 to 10 years, but will this happen in a couple of years? I do not see this. Also 64-core CPUs have already been developed in lab. I forget if the architecture is symmetric or if it has other practical problems, but CPUs may also be vastly improved in 5 to 10 years. I know GPU is much cheaper in terms of price per GFLOP, but all that matters is whether we can get the speed in practice. When the few GPU-powered algorithms were published in 2007, I told my friends that they would not be popular in a couple years. I was right that time, although maybe wrong this time.

ADD REPLY
0
Entering edit mode

Another thing: you are using old speed comparisons. Look how the raw performance of CPU vs. GPU changed in the last 2 years. This gap will continue to grow, eventually making GPU computing feasable for a lot of application it isn't just quite ready.

ADD REPLY
2
Entering edit mode
13.4 years ago
Darked89 4.6k

If you trust benchmarks published by the vendor: http://www.nvidia.com/object/bio_info_life_sciences.html

GPU HMMER looks impresive, but they compare it to HMMER 2.0. HMMER 3.0 is way faster: http://hmmer.janelia.org/

ADD COMMENT
1
Entering edit mode
12.0 years ago
Biostar User ▴ 360

Another project related to GPU and MSA: http://gpualign.cs.put.poznan.pl/

ADD COMMENT
0
Entering edit mode
12.0 years ago

Also look biomanycores. A good resource for libraries for the Bio* frameworks.

ADD COMMENT

Login before adding your answer.

Traffic: 2068 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6