I want to process both reads in a paired-end read simultaneously, but don't know how to do this efficiently.
I'm currently doing it the following way, using the
pysam Python wrapper for for the htslib API (Samtools).
samfile = AlignmentFile(filename, 'rb') # BAM file reader. # Iterate through reads. for read in samfile: # Check that the read has a pair that is mapped and not a duplicate. if read.is_paired and not read.mate_is_unmapped and not read.is_duplicate: # Get the other read in the pair. read_mate = samfile.mate(read)
This is really slow however.
From what I understand, the
samfile.mate(read) call makes the BAM file reader jump to another place in the BAM. So it sounds like it would be a good idea to have two readers open and using one to iterate through the reads and the other to fetch the mate read, but that doesn't speed things up. I don't know whether paired-end reads are adjacent in a sorted and indexed BAM file, in that I case I would be able to take advantage of that, and do something along the lines of:
reads = samfile.fetch() for i in range(number_of_pairs): read1 = next(reads) read2 = next(reads)
Anyone know how I can do this in a way that won't take forever to run?