Hello everyone,
I am interested in detecting deletion events in sequencing data, more specifically PacBio data. I searched for a simple solution to detect and extract position of deletion in a SAM file but I could not find anything despite the apparent simplicity of the problem.
In input I have a SAM file that contains in particular : 1) the position of the first base of each read 2) the CIGAR where M stands for a Match, I stands for an insertion and D stands for a deletion
What I want to do is, in each read of my SAM file, getting the start and end position of the deletion.
Input :
pos CIGAR
1000 200M200D300M
Output :
deletion_start deletion_end
1201 1400
I feel that it can be done with a few command lines in python but I am just learning this language. Once I have my table with the deletion I will be independent since I know R much better but for this step I would need your knowledge guys. If there is a tool that does exactly this and that I missed it is even better!
Thank you very much.