Hello friends,
I often work with SAM/BAM formatted short read alignments in a context where it is beneficial to know all alignments for each read rather than best/primary/secondary. My question:
Among short read alignment tools and their configurations, is there ever a case where alignments for multi-mapping reads would be reported in non-adjacent order? I am aware that some tools support sorting outputs by coordinate, but I would expect this to be easily determined from an
@HD SO coordinate
header. I am more concerned with defaultunsorted
outputs.
The reason I ask is that during read quantification I need to know how many alignments were produced for each multi-mapping read. This is cheap and easy to determine when the alignments are listed adjacent to one another. Intuitively it would make sense for this to be the case. However, from the documentation I've read only STAR's explicitly mentions adjacency. I primarily use bowtie but I want my scripts to be able to support other short read alignment tools like HISAT2, Bowtie 2, and BWA.
I'm aware of the NH:i
field, but it isn't reported by all tools (for example, bowtie instead reports XM:i
which has other complications, and I believe X0 + X1
is one approach with BWA). The closest guarantee for adjacency seems to be @HD
headers reporting SO queryname
or GO query
, but I would prefer to skip samtools sort -n if I can assume adjacency for files that don't report SO coordinate
or GO reference
.