Question: Extracting CIGAR string from a specific region of a read in a bam file using pysam?
gravatar for kiki4133
17 months ago by
kiki41330 wrote:

I am trying to use pysam to extract the CIGAR string from a bam file of aligned long-read RNA sequencing data. I want to extract the CIGAR string from only a certain region in the read, NOT the CIGAR string of the entire read that overlaps this region. The reason for this is that I am working with long read sequencing data, so the CIGAR string is quite long, and I want to look at the string surrounding each intron in each read.

As input, I have the aligned bam file and a bed file of the introns (+/- a window) that I am interested in.

I have currently been trying to use pysam to extract the CIGAR string from an intron as follows:

import pysam
bamfile = pysam.AlignmentFile("wt_reads.bam", "rb") 

for read in bamfile.fetch('chr9', 108339540, 108339796):

This outputs a list of CIGAR strings for reads overlapping the region in .fetch(), for example:


However, this is the CIGAR string for entire reads (with lengths of 759, 1060, and 642 in the above example). I would like only the CIGAR string of the 256 nt region chr9:108339540-108339796. Eventually I would like to return the CIGAR strings for all regions (introns) in the bed file.

Is there a way to do this with pysam or other tools?

rna-seq pysam bam cigar • 1.2k views
ADD COMMENTlink modified 17 months ago • written 17 months ago by kiki41330


no solution but a more general procedure to get you result: You must take the mapping starting position into account and calculate at which position in the read's sequence your region of interest starts and ends. According to this you have to manipulate/trim your CIGAR.

fin swimmer

ADD REPLYlink written 17 months ago by finswimmer13k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1735 users visited in the last hour