As the other contributors mentioned, you can get quite a long way with high memory, multi-core hardware, ~5 to 10 concurrents depending on the range and size of the sequence databases. Adding more memory or cores will help (vertical scaling), but you'll see diminishing returns. For spiky or higher level usage you're going to need to start distributing the load across multiple boxes (horizontal scaling).
For a more scalable system, consider provisioning a collection of servers as a processing cluster. Best practices for batch processing apply here: the nodes should be 'share nothing' with tasks distributed via message queues.
For BLAST, it can be more cost effective to run a larger number of less powerful servers.
A few other topics to consider to optimise such a system for throughput and cost:
Shard by database and usage
You can provide different queues to route searches to specific groups of servers. High use or large datasets can occupy their own dedicated, heavyweight infrastructure, whilst lower usage or smaller datasets can happily coexist on smaller, cheaper hardware. Monitor usage, response times and latency to gauge the best bang for buck.
You could investigate the possibility of running the searches against elastic compute with services such as EC2 (*). With message queues and horizontal scaling, running on utility computing can allow you to increase your capacity under increased demand, and reduce it as demand subsides (evenings, weekends etc).
Reduce the overhead of repeat submission (very common with BLAST!), by caching the input parameters and search results in a database. If a user repeats a search, just return the result immediately.
A bit OT, but important for uptake of a distributed system: make it easy for your users to submit their searches. Depending on the technical knowledge of your users, grid-engine-style tools can help. However for short tasks which are submitted often (such as BLAST, format exchange, Radar, Needle, etc), some users may find them heavyweight. Instead, you can hide a lot of this complexity by providing a thin wrapper to your users that submits their task to a queue, and polls or awaits notification that the task has completed before returning results locally.
So - in answer to your question, the horsepower of the physical hardware is only one factor in determining throughput and concurrency. There are a number of architectural factors that can help you scale up too.
- = heads up, I work at Amazon.