I have tried a "home-made" method in order to extract the number of nucleotides per read out of a SAM file but is not working correctly for all the reads due to some deletions (I guess).
grep -e "pattern" my.sam | cut -f 2,3,4,5 > output.txt
To explain, I search for reads with a pattern and then extract some information from every read such as Chromosome, Position, Strand, and Sequence. Then I use R software to count the number of characters of the "Sequence" column and I get the number of nucleotides per read. However, the sequence sometimes might contain a deletion "-" which counts as a character and I get some misplaced reads. I don't want to get rid of these reads. My alignment parameters allow only 1 mismatch so I expect to get a single deletion or addition if that matters.
Is there any way to use the cigar strings of each read and extract the correct number of aligned nucleotides per read?
Thanks for your time, Ioannis