Question: SAM files alignment section: PNEXT
0
gravatar for AlchemistMoz
3.1 years ago by
AlchemistMoz20
AlchemistMoz20 wrote:

The alignment section of a SAM/BAM file contains 11 mandatory fields such as PNEXT.

POS in column 4 gives:

1-based leftmost mapping POSition

PNEXT in column 8 gives:

Position of the mate/next read

So, my goal is to calculate the distance between read-pairs in python by measuring the distance between the end position of a read and the end position of it's mate. Basically --->distance<---

Thus, I wonder if someone knows what position PNEXT gives exactly? Is it the start or end position of it's mate. And if it's the start position, how can I get the end position?

sam samtools bam • 1.0k views
ADD COMMENTlink modified 3.1 years ago by Devon Ryan94k • written 3.1 years ago by AlchemistMoz20

What Devon said.

how can I get the end position?

once you have the read ,you'll have to walk over the CIGAR string; Here is the code from HTSJDK:

 mAlignmentStart + getCigar().getReferenceLength() 

class Cigar { (...)
    public int getReferenceLength() {
        int length = 0;
        for (final CigarElement element : cigarElements) {
            switch (element.getOperator()) {
                case M:
                case D:
                case N:
                case EQ:
                case X:
                    length += element.getLength();
                    break;
                default: break;
            }
        }
        return length;
}
ADD REPLYlink written 3.1 years ago by Pierre Lindenbaum127k

There are other cigar operations, though... that code doesn't look very robust. It will return an incorrect answer for any read with an insertion, soft-clipping, etc.

ADD REPLYlink written 3.1 years ago by Brian Bushnell17k

One of the weaknesses of the sam format is that it does not tell you the stop position of reads. If you map with BBMap, you can use the "stoptag" flag to get this information in a custom field, though.

ADD REPLYlink written 3.1 years ago by Brian Bushnell17k
2
gravatar for Devon Ryan
3.1 years ago by
Devon Ryan94k
Freiburg, Germany
Devon Ryan94k wrote:

PNEXT is the start position of the mate. If you absolutely must know the mate inner distance then you may need to look at the mate to find out where it ends (N.B., you will only need to do this if the mate precedes the read you're looking at, since otherwise you can just use PNEXT). Having said all of that, you have to ask yourself if that's really worth it. The TLEN field is there and you can just subtract 2*read length to get an approximate value.

ADD COMMENTlink written 3.1 years ago by Devon Ryan94k

Thx for the reply! When you say that I can substract 2*read length, substract from what?

Let's say I have a read with POS 39312, PNEXT 39572 and TLEN 101. (CIGAR = 101M).

I can add 101 to 39312 to get it's approximate end pos and then with PNEXT, should I substract 101 from it? (Because Read-pairs go in opposite directions. I'm guessing PNEXT is the 1-based left-most position as well.) Substracting one from the other (39471-39413) would then give me a read-pair distance of 58. Is this how you meant?

ADD REPLYlink written 3.1 years ago by AlchemistMoz20

What aligner produced a TLEN of 101 for that? It's wrong. Assuming the mate is also 101 bases and all of it maps then the TLEN field should be 360, since the mate should end around 39672.

ADD REPLYlink written 3.1 years ago by Devon Ryan94k

That's strange, don't know why TLEN is too short. I didn't align this BAM file. But I see that you calculated the distance to ca 158 bases and that you count towards the right for the mate to get 39672. Shouldn't it be the other way around, that is, should the mate end at ca 39471 instead?

ADD REPLYlink written 3.1 years ago by AlchemistMoz20
1

No, positions always refer to the left-most position on the + strand. Stated differently, alignments always extend to the right and are always in reference to the + strand.

ADD REPLYlink written 3.1 years ago by Devon Ryan94k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1134 users visited in the last hour