How to iterate through paired-end reads in pysam?
1
0
Entering edit mode
10 weeks ago
zt10122 ▴ 20

I want to process both reads in a paired-end read simultaneously, but don't know how to do this efficiently.If I use for read in bam_file: and bam_file.mate(read) , each read is Iteratived twice, so is there an efficient way to iterate through paired-end reads in pysam?

bam sam samtools pysam bioinformation • 270 views
1
Entering edit mode

each _alignment_

is there an efficient way to iterate through paired-end reads in pysam?

1
Entering edit mode
10 weeks ago

The cheap answer to this is "no". Its a massive headache for many people that write software that process BAM files. Its not pysam's fault, there simply isn't a good way of doing it on am abritrarily or position-sorted BAM file.

The slightly more useful answer, as suggested by @Pierre Lindenbaum, is to sort your BAM by name. You can now collect all alignments for a certain read by iterating until the read name changes. However, be aware that if you have multiple alignments per read, than it can get difficult working out which read goes with which pair (hence @Pierre's suggestion to remove secondary and supplementary alignments). But sometimes you need the secondary alignments and sometimes you need position sorted. In which case, you are basically stuffed.