Hi. I am trying to retrieve exact break point positions from the CIGAR string using python. I know how to use the regex to retreieve integers that are upstream the letter denoting insertion or deletion. How can I sum these integers, or do anything else, so that my script will report the start and end position of my indel?
import re a = '12M1I' if 'I' or 'M' in a: matchI = re.findall(r'(\d+)I', a) intlistI = [int(x) for x in matchI] print matchI matchM = re.findall(r'(\d+)M', a) intlistM = [int(x) for x in matchM] print matchM
or just simply:
match = re.findall(r'(\d+)(\w)', a) print match
Hmm it's not really helpfull. I'm already using pysam later in my script. I do have the reports of whether I have an insertion or deletion. I only want the script to report the exact location of my indel. So going with the starting position of read alignment what is the position of the indel. example: I have read 10M1I10M1D starting position let's say 10 and I would like an output: 1 insertion 20 1 deletion 31
store the initial position read.pos in a variable, for each match/deletion, increase the variable by cigarLength, report the insertions/deletion.
Yes, the problem is simple: a = '10M1I10M1D' pos = 10 match = re.findall(r'(\d+)(\w)', a) print match for i in match: #print i,i posindel = pos + int(i) print posindel,i,i
Does not report the correct position of every next event
I am confused, is it a CIGAR problem or a problem with the program ?