SAM files alignment section: PNEXT
1
0
Entering edit mode
7.1 years ago
AlchemistMoz ▴ 40

The alignment section of a SAM/BAM file contains 11 mandatory fields such as PNEXT.

POS in column 4 gives:

1-based leftmost mapping POSition

PNEXT in column 8 gives:

Position of the mate/next read

So, my goal is to calculate the distance between read-pairs in python by measuring the distance between the end position of a read and the end position of it's mate. Basically --->distance<---

Thus, I wonder if someone knows what position PNEXT gives exactly? Is it the start or end position of it's mate. And if it's the start position, how can I get the end position?

SAM BAM samtools • 3.3k views
ADD COMMENT
0
Entering edit mode

What Devon said.

how can I get the end position?

once you have the read ,you'll have to walk over the CIGAR string; Here is the code from HTSJDK:

 mAlignmentStart + getCigar().getReferenceLength() 

class Cigar { (...)
    public int getReferenceLength() {
        int length = 0;
        for (final CigarElement element : cigarElements) {
            switch (element.getOperator()) {
                case M:
                case D:
                case N:
                case EQ:
                case X:
                    length += element.getLength();
                    break;
                default: break;
            }
        }
        return length;
}
ADD REPLY
0
Entering edit mode

There are other cigar operations, though... that code doesn't look very robust. It will return an incorrect answer for any read with an insertion, soft-clipping, etc.

ADD REPLY
0
Entering edit mode

One of the weaknesses of the sam format is that it does not tell you the stop position of reads. If you map with BBMap, you can use the "stoptag" flag to get this information in a custom field, though.

ADD REPLY
2
Entering edit mode
7.1 years ago

PNEXT is the start position of the mate. If you absolutely must know the mate inner distance then you may need to look at the mate to find out where it ends (N.B., you will only need to do this if the mate precedes the read you're looking at, since otherwise you can just use PNEXT). Having said all of that, you have to ask yourself if that's really worth it. The TLEN field is there and you can just subtract 2*read length to get an approximate value.

ADD COMMENT
0
Entering edit mode

Thx for the reply! When you say that I can substract 2*read length, substract from what?

Let's say I have a read with POS 39312, PNEXT 39572 and TLEN 101. (CIGAR = 101M).

I can add 101 to 39312 to get it's approximate end pos and then with PNEXT, should I substract 101 from it? (Because Read-pairs go in opposite directions. I'm guessing PNEXT is the 1-based left-most position as well.) Substracting one from the other (39471-39413) would then give me a read-pair distance of 58. Is this how you meant?

ADD REPLY
0
Entering edit mode

What aligner produced a TLEN of 101 for that? It's wrong. Assuming the mate is also 101 bases and all of it maps then the TLEN field should be 360, since the mate should end around 39672.

ADD REPLY
0
Entering edit mode

That's strange, don't know why TLEN is too short. I didn't align this BAM file. But I see that you calculated the distance to ca 158 bases and that you count towards the right for the mate to get 39672. Shouldn't it be the other way around, that is, should the mate end at ca 39471 instead?

ADD REPLY
1
Entering edit mode

No, positions always refer to the left-most position on the + strand. Stated differently, alignments always extend to the right and are always in reference to the + strand.

ADD REPLY

Login before adding your answer.

Traffic: 2476 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6