Slow BLAST speed when database on external drive?
2
1
Entering edit mode
5 months ago
theclubstyle ▴ 40

Morning All,

I'm running a fairly unremarkable BLASTn on the nt_prok database and I'm seeing a huge speed difference when the database is on an external drive (i.e. much slower) compared to on the local drive. I'm guessing this is because of the i/o speeds but this is a reasonably quick SSD (1GB/S). Using the local drive is workable for now but not in the long run.

Given the decent i/o speeds, this might be a cache issue? Is there any way to use the SSD as a cache drive while having the database still sotred on it? (apologies if this I'm barking up the wrong tree but I'm not much of a systems engineer!)

BLAST • 643 views
ADD COMMENT
0
Entering edit mode

Is it a USB-C drive?

ADD REPLY
0
Entering edit mode

How much memory do you have available? Any nt* databases are going to need several tens of GB of memory. Perhaps that is your real issue.

ADD REPLY
4
Entering edit mode
5 months ago

you can use top or htop to check the status of the job... is it using as much CPU as you would expect? Or is it using less... and the "status" column indicates whether it's running "R" or sleeping or waiting (I think letter "D" for Delay). D would point at I/O bottlenecks.

When running with many cores it's easy to get those I/O issues.

It could be that your disk speed is hampered by the interface (e.g., are you going through a dongle?)

ADD COMMENT
2
Entering edit mode

This is useful, thanks! When running from the SSD it spends much more time in 'sleeping' than 'running' compared to the local disk. And the CPU usage is way less (>50%) when running from SSD. Usually I multithread these things, but single-threading doesn't seem to make much difference. I think the bottleneck is reading in the database to memory / cache the first instance, which I think is single-thread only on BLAST and for this particular job takes the biggest proportion of overall runtime.

I guess I'll just have to plead with my PI for a hardware upgrade...

ADD REPLY
1
Entering edit mode

Frankly instead of getting some X for a workstation on which few people are working from time to time it may be better to pay for an account on a cluster. This may include regular backups, professional help with software installation and usually way faster data processing than single even a high end workstation can deliver. With academic grants/discounts that should be a safer and better option than some PhD student or postdoc often figuring out in breaks from wet lab work the optimal hardware configurations or how to install Y.

ADD REPLY
2
Entering edit mode
5 months ago

Are you sure the ext SSD is plugged into a USB3 port ? USB2 is going to slow things down a lot.

The cheapest way forward might be to add a new internal SSD, or upgrade your current one if no extra drive bay exists in the machine.

I'd try to get a workstation for this kind of analysis though.or use a University server.

You can check drive and application throughput and behaviour using a python tool - glances.

ADD COMMENT

Login before adding your answer.

Traffic: 3313 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6