Is It Possible To Mmap A Recarray In Python 2.7?
1
0
Entering edit mode
10.2 years ago
t.c.a.smith ▴ 10

I have a large global recarray totaling 30GBs of data in a programme running via qsub on a cluster with 256GBs of RAM. I am currently the only user on this cluster so there are no conflicts with the allocation of RAM. When looping over this recarray the system appears to shunt the object to the disc, not keep it held in RAM, thus slowing the loop in excess of 5 fold. I have been looking at using mmap on the object in the following ways and received the following errors.

m = mmap.mmap(myrecarray, 0)
MMAP TypeError: only length-1 arrays can be converted to Python scalars

m = mmap.mmap(myrecarray.fileno(), 0)
AttributeError: record array has no attribute fileno

is it possible to use mmap to hold a recarray object in the RAM, or is this a total misuse of the mmap method or can this only be done for other object types, like strings or files?

many thanks

python • 3.2k views
ADD COMMENT
0
Entering edit mode

This is a hardcore programming question, definitely there are python cracks around here, but I think you are better served with asking this on stackoverflow. Also, if you want to keep this question open here, please construct a plausible sounding ;) connection to bioinformatics.

ADD REPLY
0
Entering edit mode

Thanks for your response. I have already asked it on the stack, but so far received no response, so i thought I'd try Biostars as we are often dealing with large amounts of data, so maybe someone had previously come across a similar issue. To relate it to biology, this recarray is a local genome build, to which I am mapping millions of mutations and carrying out quantitative genetic analyses. I hope that clears things up.

ADD REPLY
0
Entering edit mode

So here is the link to the cross post: http://stackoverflow.com/questions/21637414/is-it-possible-to-mmap-a-recarray-in-python-2-7 It is always a good idea to provide this information from the beginning.

ADD REPLY
0
Entering edit mode

Thanks Michael, hopefully someone out there has some experience with locking objects into RAM.

ADD REPLY
0
Entering edit mode

afaik, the mmap POSIX system call can map files and devices to memory, but then they are just a pointer to a vector of bytes, nothing high level like a python data structure.

ADD REPLY
1
Entering edit mode
10.2 years ago

Do you need random access to the entire array? If so then you'll need to use something like an sqlite database. The mmap module simply makes a file available as a mutable string and/or a file object that you can seek/peek on bytes. Your recarray doesn't have a file handle because it's not a file-like object. You could read the file line-by-line and do what you need with it.

ADD COMMENT
0
Entering edit mode

OK, thanks for the answer, it would be preferable to have access to the full array, there are many other options to overcome the problem. I was initially looking to the recarray system as it is a superfast way to construct the array in the first place and the field names make it perfect for intuitive access. but maybe going line by line using the more conventional indexing method will be preferable. FYI, I managed to use...

m = mmap(-1,13000000000, MAP_PRIVATE)
m.write(myrecarray)

and "top" showed the correct increase in RAM in the "virt" column but a lesser increase in the "res" column, however the loop time did not not decrease, suggesting that that RAM had been been reserved for the proccess but the object had not fully been mapped to it. Time try a different approach I think. Thanks a lot for your time

ADD REPLY
0
Entering edit mode

If your data are highly structured already on disk you might still be able to get away with a mmap object. Say each field in your data file is 16 bytes long. You could then access the nth field by either seeking to the 16 x n byte or indexing the 16 x n position of the mmap object.

ADD REPLY

Login before adding your answer.

Traffic: 2573 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6