I'm trying to create a program that can count the number of reads in a BAM file that overlap a specific genomic coordinate(for example, chr15:28365618) and which nucleotides are present in those reads(with a minimum base quality of 30). This is my program so far:
import pysam bamfile=pysam.AlignmentFile("file1.bam","rb") for read in bamfile.fetch("chr15",28365618,28365619) a=0 c=0 g=0 t=0 n=0 pos=read.reference.start nucl=read.query_sequence[28365618-pos] for i in nucl if i=="A": a+=1 elif i=="C": c+=1 elif i=="G": g+=1 elif i=="T": t+=1 elif i=="N": n+=1 print a+c+g+t+n #total no. of reads print "A;",a print "C:" c print "G",g print "T",t print "N",n
However,my output is way off when I compare it to IGV, and I can't quite figure out why.Maybe because of duplicate reads? Any help will be greatly appreciated.