I am trying to use pysam to extract the CIGAR string from a bam file of aligned long-read RNA sequencing data. I want to extract the CIGAR string from only a certain region in the read, NOT the CIGAR string of the entire read that overlaps this region. The reason for this is that I am working with long read sequencing data, so the CIGAR string is quite long, and I want to look at the string surrounding each intron in each read.
As input, I have the aligned bam file and a bed file of the introns (+/- a window) that I am interested in.
I have currently been trying to use pysam to extract the CIGAR string from an intron as follows:
import pysam bamfile = pysam.AlignmentFile("wt_reads.bam", "rb") for read in bamfile.fetch('chr9', 108339540, 108339796): print(read.cigarstring)
This outputs a list of CIGAR strings for reads overlapping the region in .fetch(), for example:
354M218N187M 283M218N559M 283M218N141M
However, this is the CIGAR string for entire reads (with lengths of 759, 1060, and 642 in the above example). I would like only the CIGAR string of the 256 nt region chr9:108339540-108339796. Eventually I would like to return the CIGAR strings for all regions (introns) in the bed file.
Is there a way to do this with pysam or other tools?