pysam segementation fault with multiprocessing
0
0
Entering edit mode
4.7 years ago
wolffj • 0

Hello,

I have posted this already as an issue on the github repository of pysam, but I did not get any answer so far. Maybe one of you can help.

I have the following issue and do not know if it is my fault or an error in pysam:

I read the data from a samfile, store it in a buffer and then I hand over this buffer to a new Process of the multiprocessing library in python. Inside this new process I can access the data without any issue but if I return via a Queue a new buffer which contains elements of the old buffer if some requirements are fullfilled, I get a segmentation fault.

I will show you a code example and I hope you can understand better what is going on.

import pysam
import multiprocessing

j = 0
buffer_1 = []
buffer_2 = []
while j < pNumberOfItemsPerBuffer:
j += 1
try:
mate1 = pFileOneIterator.next()
mate2 = pFileTwoIterator.next()
except StopIteration:
break
# if some_conditions on mate1 and mate2 fit
buffer_1.append(mate1)
buffer_2.append(mate2)
return buffer_1, buffer_2

def process_data(pMateBuffer1, pMateBuffer2, pQueue):
# process data
i = 0
buffer_out = []
while i < len(pMateBuffer1):
mate1 = pMateBuffer1[i]
mate2 = pMateBuffer2[i]

# if some conditions are fulfilled, add
buffer_out.append(mate1)
buffer_out.append(mate2)
# this access works
print buffer_out[0].flag
pQueue.put(buffer_out)

def main(args=None):
# create alignment file
str1 = pysam.Samfile('File1', 'rb')
str2 = pysam.Samfile('File2', 'rb')

buffer_1, buffer_2 = readData(str1, str2, 500000)
queue = multiprocessing.Queue()
p = multiprocessing.Process(target=process_data, kwargs=dict(
pMateBuffer1=buffer_1,
pMateBuffer2=buffer_2,
pQueue=queue
))
p.start()
# let process compute and then get data back
result = queue.get()
# type can be seen:
print type(result[0]) # prints <type 'pysam.libcalignedsegment.AlignedSegment'>
# but this access raises a segementation fault
print result[0].flag # Segmentation fault (core dumped)
p.join()


Why can I copy the data from process A to B, but for some reasons it fails to copy all the data back from B to A? Am I doing something wrong if yes, what? For me it looks like that some pointers or references in the underlying data structure are deleted.

Thanks a lot for your help!

software error Python Pyseq multiprocessing • 2.0k views
2
Entering edit mode

There is an option to allow multiple iterators when opening a sam or bam file, I can look it up later when I'm home. (Currently waiting for pizza.). I believe it's called multiple_iterators and it would allow multiple treads to eat from the same pizza file.

Pizza > pysam