Question: What are the sematics of containing bwa split read alignments?
gravatar for d-cameron
3.6 years ago by
d-cameron2.2k wrote:

I'm parsing the output of bwa (0.7.15) and I'm getting some split read alignments that seem very strange. For example, take the following redacted record:

ReadName 2115 chrUn_JTFH01001478v1_decoy 12 1 250M chr21 10325621 0 * * NM:i:13 SA:Z:chr21,10325808,-,240M10S,9,19;

The read is aligned to both chr21 and the decoy contig. What I find strange is that although the read is reported as a split alignment with two SAM records, the entire read aligns to the supplementary alignment position.

What is the meaning of such a record? How can a genuinely split read contain an alignment that is essentially not split? Is this a bug in bwa? An artefact of alt contig mapping?

Edit: the SAM specifications have the following definition for the SA tag:

SA:Z:(rname ,pos ,strand ,CIGAR ,mapQ ,NM ;)+ Other canonical alignments in a chimeric alignment, formatted as a semicolon-delimited list. Each element in the list represents a part of the chimeric alignment. Conventionally, at a supplementary line, the first element points to the primary line.

bwa split read sa • 1.5k views
ADD COMMENTlink modified 3.1 years ago by Biostar ♦♦ 20 • written 3.6 years ago by d-cameron2.2k
gravatar for Pierre Lindenbaum
3.6 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum131k wrote:

IMHO this is a another alignment with a poor mapping quality and a large number of mismatches NM=13. I would imagine that the primaru alignment has a better mapping quality and a lower number of mismatches.

ADD COMMENTlink written 3.6 years ago by Pierre Lindenbaum131k

The primary alignment is given in the SA tag: 19 mismatches (10 bases soft clipped), nominal mapq of 9.

The issue is that bwa is reporting it as a split read alignment using the SA tag, not an alternate alignment using the XA tag. There's also 6 alternate alignments in the XA tag that I removed for clarity as they are different alignment possibilities, and do not form part of a single split alignment.

One would expect split reads to have CIGARs something like "25M75S" on one record, and "25S75M" on the other because the aligner is reporting a split read alignment. In this case, the aligner is reporting a split read in which one of the alignments is not split at all.

ADD REPLYlink modified 3.6 years ago • written 3.6 years ago by d-cameron2.2k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1869 users visited in the last hour