Question: Parallel Computing With Key/Value Stores (Multiple Writes)
3
gravatar for Sequencegeek
8.6 years ago by
Sequencegeek740
UCLA
Sequencegeek740 wrote:

Hi everyone,

I've been setting up a data management system using Berkeley DB and the python bindings (bsddb3) to run parallel tasks for ~ 80 million RNA sequences that have properties similar to the "Target RNA" schema below.

Target RNA
length, sequence, expression, mismatchPositions, etc.

Problem:

10 Parallel processes that access separate records ("rows") and write to the database run very slowly. Each job seems to run slower than the previous, perhaps indicating some sort of locking issue? For instance, if I do a test to update the length of 10 million sequences I'll run 10 separate jobs that access different records (0-1,000,000 for the first job, 1,000,001-2,000,000 for the second, etc.) like so:

#n is job number, dbFN is the filelocation of database
import bsddb3
db =  bsddb3.db.DB()
db.set_cachesize(1,0)
db.open(dbFN, None, bsddb3.db.DB_HASH, bsddb3.db.DB_CREATE)
for i in range(n*1,000,000, (n+1)*1,000,000):
    db[str(i)] = str(int(db[str(i)]) + 1)
db.close()

And the runtime (s) for each job finishing are something like: 30, 50, 60, 70, etc. Running only one job to update the whole 10 million sequences will take ~ 120s. It seems to slow down the cluster even though the database is 200 MB total. Running 100 jobs that are read only works fine. I'd like to be able to run 30+ jobs concurrently so not being able to run 10 worries me.

Questions:

  1. Each Processs is accessing different records, why would the speed of the nth job depend on the previous jobs? Doesn't bdb lock on records as opposed to entire DBs? Will manually using transactions speed this up?
  2. In my previous post there was mention of using redis to run 100 jobs running in parallel all accessing the same memory file, is this something berkeley db is not capable of doing because the records aren't stored in ram like redis?
  3. How can I tell which part of the system is causing the problem? Is it a bdb problem or a network/filesystem problem. I'm fairly ignorant about filesystems/networks and don't know how to troubleshoot this type of problem. Any tips would be appreciated.
  4. Is there flags I can pass to allow easier concurrency or db parameters I can change like page sizes to make it more efficient?
  5. If the problem is due to my network/filesystem setup, how is your system set up to all for this type of concurrency?

System Setup/Stats
- bdb 4.8 with bsddb3
- 5 nodes (w/ Gigabit ethernet connections) using NFS
- "dd throughput": 60MB/s
- cache: 1GB for each db connection
- each db is a HASH type.

If any more info is needed I'll update ASAP.

Thanks.

EDIT:

FYI, Berkeley DB does not support synchronous database access by multiple processes FOR NFS (as mentioned in the comments) http://www.oracle.com/technetwork/database/berkeleydb/db-faq-095848.html

Also, Tokyo/Kyoto cabinet do not support simultaneous database access via multiple processes at all - Tokyo/Kyoto Tyrant/Tycoon have to be used instead, although I'm not sure if even these are safe with NFS?

database parallel • 4.1k views
ADD COMMENTlink modified 14 months ago by amirouche.boubekki10 • written 8.6 years ago by Sequencegeek740
2

My experience is that using NFS with multiple nodes accessing the same informations is a very bad idea. I don't know the details about this particular problem, but if it's just reading and writing simple data, you might be better of using mysql. If the problem is more about distributing load to nodes, I'd recommend to use Hadoop.

ADD REPLYlink written 8.6 years ago by Pablo1.9k
2

re: NFS mulitple nodes writing to the same file is bad news, I tried all kids of locking schemes and never got it to work consistently i.e. it would fail one in million times. Finally gave up and just wrote to separate files and aggregated them later. Inelegant, but it works.

ADD REPLYlink written 8.6 years ago by Gareth Palidwor1.6k

I think you will get some good answers on the BDB forum: http://forums.oracle.com/forums/forum.jspa?forumID=271

ADD REPLYlink written 8.6 years ago by Pierre Lindenbaum124k

I'm copying the question there right now but I figured I ask here as well because people might have experience with it.

ADD REPLYlink written 8.6 years ago by Sequencegeek740

Thanks for the info everyone. Will a network interface be safe with NFS? It is only using one process/server per database (Tyrant, Tycoon, Redis) so I'd venture that it would be?

ADD REPLYlink written 8.6 years ago by Sequencegeek740

@gawp Thanks for the info everyone. Will a network interface be safe with NFS? It is only using one process/server per database (Tyrant, Tycoon, Redis) so I'd venture that it would be?

ADD REPLYlink written 8.6 years ago by Sequencegeek740

My recommendation to use Redis still stands.

ADD REPLYlink modified 10 weeks ago by RamRS25k • written 8.6 years ago by Aleksandr Levchuk3.2k

@Aleksandr Levchuk: Redis looks amazing, but If I can get a solution that doesn't load all the databases into memory it would be better. I'm currently trying out both Redis and Tyrant. PS: I've referenced your example about loading 100 processes using a single database thread multiple times since that post, thanks for the reply - it was helpful.

ADD REPLYlink written 8.6 years ago by Sequencegeek740
2
gravatar for brentp
8.6 years ago by
brentp23k
Salt Lake City, UT
brentp23k wrote:

It might help to show some more of your code. First, you are using processes, not threads, correct? Though it sounds like if it's slowing down the entire cluster, it's I/O limited anyway.

Next, for 80million records, you should probably use a BTree, as you'll get (in my experience) a lot of collisions using a HASH type and it will slow down quite a lot.

If concurrency is your main concern, you might want to look at Kyoto Cabinet, a DBM with that in mind.

ADD COMMENTlink written 8.6 years ago by brentp23k

Code updated. Yes I'm using processes, one cpu for each process. I think it might have to do with I/O also, but I don't know how to confirm? It if was a matter of throughput wouldn't doing 100 read only processes be slow as well, because only the write processes hang. Any suggestions how to verify I/O problems?

ADD REPLYlink written 8.6 years ago by Sequencegeek740

when writes hang in mysql it's usually because of indexes, but BDB is a different beast

ADD REPLYlink written 8.6 years ago by Jeremy Leipzig18k

you could just check top on a node and see if the CPU is load and the load is high, presumably, that should be the case since you're not doing anything computationally intensive. Have you tried tuning the page and bucket size to match your data?

ADD REPLYlink written 8.6 years ago by brentp23k
2
gravatar for Jeremy Leipzig
8.6 years ago by
Philadelphia, PA
Jeremy Leipzig18k wrote:

Can you devise a means of writing your inserts and updates to file to be executed at once after the select queries are complete? That seems to be a common strategy when concurrency is not required.

ADD COMMENTlink written 8.6 years ago by Jeremy Leipzig18k

Thanks for the reply. I can update all the writes at the same time after manipulating the data (I think/hope) using transactions. The downside is I have to store the data in ram before writing it and for some jobs that's why I need the random access. If the reason it hangs is because I do too many writes/second then I will definitely change it to do a bulk update, but I don't know if this is the reason and want to verify first.

ADD REPLYlink written 8.6 years ago by Sequencegeek740
2
gravatar for lh3
8.6 years ago by
lh331k
United States
lh331k wrote:

An unrelated answer. You must have good reasons to use a database, and to modify the entire table in short wall-clock time. But when you want to modify the entire database, it is faster to use a single thread and write out a new table. For 10 million rows and 200MB data, Perl/Python can do most simple operations on a data stream in ~10 seconds.

ADD COMMENTlink written 8.6 years ago by lh331k
1
gravatar for amirouche.boubekki
5.1 years ago by
amirouche.boubekki10 wrote:
  • Each Processs is accessing different records, why would the speed of the nth job depend on the previous jobs?

I don't know (yet).

  • Doesn't bdb lock on records as opposed to entire DBs?

No, when working with btree and hash backends bsddb doesn't lock only on record. It use page locking. You can tune the page size of the database to reduce the number of lock conflicts and speed things up.

  • Will manually using transactions speed this up?

I'm not sure what you mean by manual transactions. Transaction will always be slower than no transaction. It's not clear from your question if it is for batch loading of data or application specific. Those are different workload and might be addressed differently.

  • In my previous post there was mention of using redis to run 100 jobs running in parallel all accessing the same memory file, is this something berkeley db is not capable of doing because the records aren't stored in ram like redis?

Kind of yes. You can configure bsddb to work fully from memory with or without persistence of data, so in the end it's kind of the same, except loading bsddb in memory is quite faster.

  • How can I tell which part of the system is causing the problem?

By exploration, try different tuning, change the configuration of bsddb...

  • Is it a bdb problem or a network/filesystem problem.

NFS is a nogo for anything database related, you must use a local harddisk.

  • If the problem is due to my network/filesystem setup, how is your system set up to all for this type of concurrency?

Check whether your jobs can conflict between each other, if not just don't use transactions at all (make a backup the database)

Regarding BTREE vs HASH, bsddb documentation says:

Choosing between BTree and Hash
For small working datasets that fit entirely in memory, there is no difference between BTree
and Hash. Both will perform just as well as the other. In this situation, you might just as well
use BTree, if for no other reason than the majority of DB applications use BTree.
Note that the main concern here is your working dataset, not your entire dataset. Many
applications maintain large amounts of information but only need to access some small
portion of that data with any frequency. So what you want to consider is the data that you will
routinely use, not the sum total of all the data managed by your application.
However, as your working dataset grows to the point where you cannot fit it all into memory,
then you need to take more care when choosing your access method. Specifically, choose:
• BTree if your keys have some locality of reference. That is, if they sort well and you
can expect that a query for a given key will likely be followed by a query for one of its
neighbors.
• Hash if your dataset is extremely large. For any given access method, DB must maintain a
certain amount of internal information. However, the amount of information that DB must
maintain for BTree is much greater than for Hash. The result is that as your dataset grows,
this internal information can dominate the cache to the point where there is relatively little
space left for application data. As a result, BTree can be forced to perform disk I/O much
more frequently than would Hash given the same amount of data.
Moreover, if your dataset becomes so large that DB will almost certainly have to perform
disk I/O to satisfy a random request, then Hash will definitely out perform BTree because it
has fewer internal records to search through than does BTree.

http://docs.oracle.com/cd/E17076_04/html/toc.htm

ADD COMMENTlink written 5.1 years ago by amirouche.boubekki10
0
gravatar for amirouche.boubekki
14 months ago by
amirouche.boubekki10 wrote:

I further tested bsddb in multiple processus settings it's a pain.

I know recommend FoundationDB which has a similar API.

Good luck and Happy hacking!

ADD COMMENTlink written 14 months ago by amirouche.boubekki10
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1849 users visited in the last hour