Why The Properly Paired Results From Bwa Mem Is An Odd Number
2
2
Entering edit mode
9.6 years ago
liupfskygre ▴ 210

Hi following is the results of samtools flagstat results. I use bwa mem default command to do align! my question is why the properly paired is odd number?

25700926 mapped

25744459 + paired in sequencing

25445603 properly paired

25692729 with itself and mate mapped

8197 singletons

my reads are from a archaea RNA-seq experiment.

and how to get unique mapped read-pairs from the properly paired?

thanks!

paired • 8.2k views
5
Entering edit mode
9.6 years ago
matted 7.7k

As best I can tell, this is a new feature arising from bwa mem's ability to generate chimeric alignments. This is where one read aligns jointly to multiple positions in the reference genome, for example the first half of the read to somewhere on chr1 and the second half to somewhere on chr2. Note that this is different from a multi-mapping read, where the entire read may be mapped multiple places.

To handle the split read case, bwa mem will generate a separate SAM record (line in the SAM file) for each aligning segment of a read. So if for example the first read of a pair gets split into two mapping segments, you could have three lines in the SAM file from that read pair (say two from the first read, one from the second). I believe it is possible for this to happen and still have all records marked as properly paired, if orientation and insert size constraints are fulfilled (flag 0x2 set in all SAM records). If this happens, you could get the odd numbers you observe.

The SAM spec has evolved to include a new flag, 0x800, that denotes the supplementary reads (all but the first, defined arbitrarily I think) in a multi-part (chimeric) alignment. I predict that if you first remove reads with the 0x800 flag set and then run flagstat, you will get an even number for the properly-paired count.

A note for completeness: flagstat just does very simple counts of how many SAM records have various flag fields set. The flag values depend completely on what the orginating aligner decided to do. Records properly paired are just those with flag 0x2, and records with itself and mate mapped are just those records where neither flag 0x4 nor 0x8 are set. Furthermore, I believe samtools currently ignores the new 0x800 flag.

1
Entering edit mode

By the way, how to check if the samtools ignores the new 0x800 flag? I use samtools 0.1.19-44428cd, thanks!

0
Entering edit mode

That is because of chimeric alignment. Heng replied. Thank you for your detailed explanation.

0
Entering edit mode
9.6 years ago
Dan D 7.4k

I think I can answer your first question. If I understand correctly, the "properly paired" statistic is the subset of mapped reads which are properly paired. It's possible that one or more mapped reads had a proper mate which did not map. Not 100% confident here, though the stat below the "properly paired" one seems to bolster my notion. Very interesting question.

I can give you a push in the correct direction on the second question. If you look at the SAM specification document, there's a tag overview starting on page 6. See that NH tag? You would want to fish out anything with a value of 1 for that tag. I'm not sure what a good tool is for that, though. I haven't had to do it yet.

0
Entering edit mode

Sorry, I don't think this is right. For bwa at least, it's not possible that "one or more mapped reads had a proper mate which did not map", since part of proper pairing is satisfying distance constraints. I think instead that the problem is a read having an odd number of proper mates, all properly mapping (see my answer for details).

0
Entering edit mode

So then a chimeric mapping like what you describe in your answer below would be an example of a pair which was not proper, but in which both reads mapped? Just checking my understanding here.

0
Entering edit mode

I think you can have a pair of reads which map properly, where the pair of reads reports an odd number of mapping segments in the SAM. All segments would be flagged as mapped and properly paired (I think). In the example, the segments would be properly paired (just so that it affects the count that the poster was originally having problems with).

I haven't used bwa mem much myself yet, but this is my understanding based on posts on the samtools-devel mailing list and the new SAM spec.

0
Entering edit mode

does bwa or bwa mem generate the NH tag? as far as I can find, it doesn't do so...