There are some aspects of the SAM/BAM format that I still do not understand, one of which is what secondary alignments should point to.
PNEXT: Position of the primary alignment of the NEXT read in the template. Set as 0 when the information is unavailable. This field equals POS at the primary line of the next read.
This quote changes the way I see the SAM format entirely. Here's a diagram to help:
Alignments 1 and 3 are from the same read, as are alignments 2 and 4. The aligner can put these four alignments into 2 "chains". Chain 1->2, because the distance between the reads is sensible, and they are on the same chromosome, and chain 3->4, again for the same reasons. Chaining 1->4 or 3->2 is not possible due to the distances being too small/large, or perhaps different chromosomes.
In this situation, I would expect an aligner to give all 4 alignments the same QNAME, but 1 and 2 are the primary alignments, and 3 and 4 are secondary alignments. Arrows show where the PNEXT of each alignment points to. This forms the two chains, all of which are properly mapped in a pair (say the aligner doesn't think being secondary prevents being properly mapped).
The way this quote reads however, is that all secondary alignments should point to the primary alignment of the next read in the fragment/template, as show in the second illustration. Alignments 3 and 4 cannot have the properly mapped in a pair flag set. The knowledge that alignments 3 and 4 would be properly mapped in a pair if alignments 1 and 2 didn't exist is not encodable within the SAM spec. and must be deduced by programs interpreting the whole alignment group (all alignments with the same QNAME).
If anyone could clarify on which scenario is correct, that would really help me figure out what data in SAM format is supposed to look like. It seems like this notion I had of chaining together alignments with the RNEXT/PNEXT is totally false..?
All the best,