pysam segementation fault with multiprocessing
0
0
Entering edit mode
7.1 years ago
wolffj • 0

Hello,

I have posted this already as an issue on the github repository of pysam, but I did not get any answer so far. Maybe one of you can help.

I have the following issue and do not know if it is my fault or an error in pysam:

I read the data from a samfile, store it in a buffer and then I hand over this buffer to a new Process of the multiprocessing library in python. Inside this new process I can access the data without any issue but if I return via a Queue a new buffer which contains elements of the old buffer if some requirements are fullfilled, I get a segmentation fault.

I will show you a code example and I hope you can understand better what is going on.

import pysam
import multiprocessing

def readData(pFileOneIterator, pFileTwoIterator, pNumberOfItemsPerBuffer):
   j = 0
   buffer_1 = []
   buffer_2 = []
   while j < pNumberOfItemsPerBuffer:
        j += 1
        try:
            mate1 = pFileOneIterator.next()
            mate2 = pFileTwoIterator.next()
        except StopIteration:
            break
        # if some_conditions on mate1 and mate2 fit
        buffer_1.append(mate1)
        buffer_2.append(mate2)
    return buffer_1, buffer_2

def process_data(pMateBuffer1, pMateBuffer2, pQueue):
    # process data
    i = 0
    buffer_out = []
    while i < len(pMateBuffer1):
        mate1 = pMateBuffer1[i]
        mate2 = pMateBuffer2[i]

        # if some conditions are fulfilled, add 
        buffer_out.append(mate1)
        buffer_out.append(mate2)
   # this access works
   print buffer_out[0].flag
   pQueue.put(buffer_out)

def main(args=None):
    # create alignment file
    str1 = pysam.Samfile('File1', 'rb')
    str2 = pysam.Samfile('File2', 'rb')

    buffer_1, buffer_2 = readData(str1, str2, 500000)
    queue = multiprocessing.Queue()
    p = multiprocessing.Process(target=process_data, kwargs=dict(
                    pMateBuffer1=buffer_1,
                    pMateBuffer2=buffer_2,
                    pQueue=queue
                    ))
    p.start()
    # let process compute and then get data back
    result = queue.get()
    # type can be seen:
    print type(result[0]) # prints <type 'pysam.libcalignedsegment.AlignedSegment'>
    # but this access raises a segementation fault
    print result[0].flag # Segmentation fault (core dumped)
    p.join()

Why can I copy the data from process A to B, but for some reasons it fails to copy all the data back from B to A? Am I doing something wrong if yes, what? For me it looks like that some pointers or references in the underlying data structure are deleted.

Thanks a lot for your help!

software error Python Pyseq multiprocessing • 2.7k views
ADD COMMENT
2
Entering edit mode

There is an option to allow multiple iterators when opening a sam or bam file, I can look it up later when I'm home. (Currently waiting for pizza.). I believe it's called multiple_iterators and it would allow multiple treads to eat from the same pizza file.

Pizza > pysam

ADD REPLY

Login before adding your answer.

Traffic: 2618 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6