Question: Machine Spec For Running A Blast Service For ~50 Users
2
gravatar for Lyco
8.5 years ago by
Lyco2.3k
Germany
Lyco2.3k wrote:

I have been asked this question by a friend, but I didn't feel I could give a satisfactory answer. Maybe somene here at BioStar can share some experience.

The question is: what kind of hardware would you need to offer a BLAST service for about 50 different users, with an expected average of 5-10 concurrent requests, which might be higher at peak times. A wide range of different databases should be offered, ranging from single bacterial genomes to NCBI-style 'nr' databases.

My initial answer was that the number of concurrent BLAST runs should be kept to a minimum by using a queuing system (similar to what is done at NCBI, EBI and the other big centers). The main problem in this particular setting is that the majority of BLAST searches are not via a fancy Web interface but rather command-line requests, generating tabular or XML output.

Maybe I should ask two questions: 1) what kind of hardware would you need (# of CPUs, how much RAM, RAID?), and 2) has someone experience with a queuing system for command-line blast?

server webservice blast hardware • 3.5k views
ADD COMMENTlink written 8.5 years ago by Lyco2.3k
3
gravatar for Alastair Kerr
8.5 years ago by
Alastair Kerr5.2k
The University of Edinburgh, UK
Alastair Kerr5.2k wrote:

I do not think that 50 users is that many, and a cluster would be over-kill: especially for regular BLAST usage. i.e. sporadic usage per person with nobody constantly using BLAST as a short read mapper or constantly running multiple large genomes against nr.

A single dual-processor server with about 24 threads and about 64GB RAM would cope easily in the above scenario (these are the specs of one of my servers with a similar number of users and used for a lot more than just BLAST)

That would handle 10 concurrent BLAST jobs without the need for a queuing system. If it became a problem then you could always wrap the blast process and submit to a queuing system such as torque (which uses the qsub command).

ADD COMMENTlink written 8.5 years ago by Alastair Kerr5.2k

@Alastair, thanks for this advice. Being no hardware person, I am a bit puzzled by your '24 threads on a dual-processor machine'. Does it make sense to run BLAST with more threads than there are processor cores ? Or do you have 12-core processors ?

ADD REPLYlink written 8.5 years ago by Lyco2.3k

Sounds like a Westmere system with 24 execution threads. I agree that 50 is not that many users. How many concurrent processes do you tend to have?

ADD REPLYlink written 8.5 years ago by Mndoci1.2k

Deepak, could you please offer some esplanation for the biologist-bioinformatician with limited hardware expertise? I looked up Westmere in Wikipedia but only found references to 2-8 core processors. Can you run more than one thread per core, and does it make sense?

ADD REPLYlink written 8.5 years ago by Lyco2.3k

each cpu has 6 cores = 12 logical cores per CPU. The technology to do this is Intels "hyper-threading" and AMD has something similar (found on the latest chips). The operating system sees 24 CPUs and runs jobs on each. It is much faster than an equivalent 12 node cluster from 6 years ago.

ADD REPLYlink written 8.5 years ago by Alastair Kerr5.2k

each processor has 6 cores = 12 logical cores per processor. The technology to do this is Intels "hyper-threading" and AMD has something similar (found on the latest chips). The operating system sees 24 CPUs and runs jobs on each. It is much faster than an equivalent 12 node cluster from 6 years ago.

ADD REPLYlink written 8.5 years ago by Alastair Kerr5.2k

The most imporant question really is the type of data that is going to be run through BLAST. People running a handful of queries sporadically will be very different from people running large datasets.

ADD REPLYlink written 8.4 years ago by Dan Gaston7.1k
3
gravatar for Mza
8.5 years ago by
Mza30
Cambridge
Mza30 wrote:

As the other contributors mentioned, you can get quite a long way with high memory, multi-core hardware, ~5 to 10 concurrents depending on the range and size of the sequence databases. Adding more memory or cores will help (vertical scaling), but you'll see diminishing returns. For spiky or higher level usage you're going to need to start distributing the load across multiple boxes (horizontal scaling).

For a more scalable system, consider provisioning a collection of servers as a processing cluster. Best practices for batch processing apply here: the nodes should be 'share nothing' with tasks distributed via message queues.

For BLAST, it can be more cost effective to run a larger number of less powerful servers.

A few other topics to consider to optimise such a system for throughput and cost:

Shard by database and usage

You can provide different queues to route searches to specific groups of servers. High use or large datasets can occupy their own dedicated, heavyweight infrastructure, whilst lower usage or smaller datasets can happily coexist on smaller, cheaper hardware. Monitor usage, response times and latency to gauge the best bang for buck.

Queue-aware compute

You could investigate the possibility of running the searches against elastic compute with services such as EC2 (*). With message queues and horizontal scaling, running on utility computing can allow you to increase your capacity under increased demand, and reduce it as demand subsides (evenings, weekends etc).

Caching

Reduce the overhead of repeat submission (very common with BLAST!), by caching the input parameters and search results in a database. If a user repeats a search, just return the result immediately.

Friendly wrappers

A bit OT, but important for uptake of a distributed system: make it easy for your users to submit their searches. Depending on the technical knowledge of your users, grid-engine-style tools can help. However for short tasks which are submitted often (such as BLAST, format exchange, Radar, Needle, etc), some users may find them heavyweight. Instead, you can hide a lot of this complexity by providing a thin wrapper to your users that submits their task to a queue, and polls or awaits notification that the task has completed before returning results locally.

So - in answer to your question, the horsepower of the physical hardware is only one factor in determining throughput and concurrency. There are a number of architectural factors that can help you scale up too.

  • = heads up, I work at Amazon.
ADD COMMENTlink written 8.5 years ago by Mza30

Thanks for your thoughtful comments. Some won't work for my friend's problem (EC2, caching) - with BLAST caching I'm not sure if it really makes sense at all. Are there really that many repetitive searches between the time points of database updates ? In my own (small) group, we are running BLAST on two 4-core machines with 10GB RAM each, which works for our purpose but certainly not for multiple concurrent searches.

ADD REPLYlink written 8.5 years ago by Lyco2.3k

At a large enough scale, caching will save you compute effort: people tend to resubmit queries rather than save them locally. For large scale, concurrent searches, distribution is definitely the way to go.

ADD REPLYlink written 8.5 years ago by Mza30
2
gravatar for Chris
8.5 years ago by
Chris1.6k
Munich
Chris1.6k wrote:

Regarding 2): We use SGE [1] here on our group internal compute cluster consisting of currently 600 CPUs. Of course it's a generic cluster queueing software and its usage isn't blast specific, but it is easy to conduct parallel blast runs on that platform. In fact, this is one of the major use cases in our group.

Chris

[1] http://en.wikipedia.org/wiki/Oracle_Grid_Engine

ADD COMMENTlink written 8.5 years ago by Chris1.6k
1

I should add that a scenario like this comes with some issues. One major problem we encountered was that a large amount of concurrent blast accesses to the underlying sequence database produces network and I/O load in a dimension that couldn't be handled any more by one single NFS server. The solution was to maintain node-local copies of the database and restrict access to those.

ADD REPLYlink written 8.5 years ago by Chris1.6k

thanks for your answer. I will forward the SGE idea to my friend. With regard to your comment: this is something that even we see here in my small group. Running BLAST via NFS is a bad idea, so we keep node-local copies of all databases.

ADD REPLYlink written 8.5 years ago by Lyco2.3k

Do you have experience, how well BLAST scales beyond 4 threads? I am currently having issues with running BLAST+ in multi-thread mode (has been reported to be fixed, but not so), so I keep using blastall

ADD REPLYlink written 8.5 years ago by Lyco2.3k

Indeed, we once made an investigation to answer that question. Result was that the blast runtime does not decrease linearly with amount of threads. If I remember correctly, beyond 3-4 threads there was no significant gain in speed visible any more.

ADD REPLYlink written 8.5 years ago by Chris1.6k

Its my understanding that the scaling with numbers of threads aspect is highly dependent on both the database being searched and the input queries. If you have a large number of queries (whole genome data) scaling to multiple threads makes a big difference. If you only have one or a handful of queries it won't as they parallelize in different ways. I usually run hundreds to thousands of queries at a time and blast+ seems to scale above 4 threads quite well but I haven't bothered to actually benchmark it.

ADD REPLYlink written 8.4 years ago by Dan Gaston7.1k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 925 users visited in the last hour