The alignment section of a SAM/BAM file contains 11 mandatory fields such as PNEXT.
POS in column 4 gives:
1-based leftmost mapping POSition
PNEXT in column 8 gives:
Position of the mate/next read
So, my goal is to calculate the distance between read-pairs in python by measuring the distance between the end position of a read and the end position of it's mate. Basically --->distance<---
Thus, I wonder if someone knows what position PNEXT gives exactly? Is it the start or end position of it's mate. And if it's the start position, how can I get the end position?
What Devon said.
once you have the read ,you'll have to walk over the CIGAR string; Here is the code from HTSJDK:
There are other cigar operations, though... that code doesn't look very robust. It will return an incorrect answer for any read with an insertion, soft-clipping, etc.
One of the weaknesses of the sam format is that it does not tell you the stop position of reads. If you map with BBMap, you can use the "stoptag" flag to get this information in a custom field, though.