I recently got quite confused with two SAM flags got from BWA alignment, which is "supplementary alignment" from chimeric alignments and "not primary alignment" (or "secondary alignment") from multiple mapping.
What samtools explain about these two flags is: (Refer to https://samtools.github.io/hts-specs/SAMv1.pdf)
A chimeric alignment is primarily caused by structural variations, gene fusions, misassemblies, RNA-seq or experimental protocols. It is more frequent given longer reads. For a chimeric alignment, the linear alignments consisting of the alignment are largely non-overlapping. Typically, one of the linear alignments in a chimeric alignment is considered the "representative" alignment, and the others are called "supplementary" and are distinguished by the supplementary alignment flag.
In contrast, multiple mappings are caused primarily by repeats. They are less frequent given longer reads. If a read has multiple mappings, all these mappings are almost entirely overlapping with each other. In multiple mapping, One of these alignments is considered "primary". All the other alignments have the "secondary" alignment flag set in the SAM records that represent them.
However, I found in my ChIP-seq alignment results got from BWA(without -M option), alignments with "supplementary" flags are with overlaps with "representative" alignments,which I think should be "secondary" alignments as described. For example, I got four alignments for one pair of reads:
HWI-C00135:237:CAR2BANXX:1:1101:6737:91207 163 chr6 144444720 60 61M40S = 144444728 61 GTACACACATATACACAGTGCTAAGTTCATTGTACACACATATACACAGTGCTAACTTCATTGTACACACATATACACAGTGCTAAGTTCATTGTACACAC BBBBBFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF NM:i:1 MD:Z:1C59 AS:i:59 XS:i:0 SA:Z:chr6,144444722,+,33S59M9S,60,2;
HWI-C00135:237:CAR2BANXX:1:1101:6737:91207 2131 chr6 144444722 11 56H45M = 144444720 -47 ACACACATATACACAGTGCTAAGTTCATTGTACACACATATACAC FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFBBBBB NM:i:0 MD:Z:45 AS:i:45 XS:i:20 SA:Z:chr6,144444728,-,53M48S,11,0;
HWI-C00135:237:CAR2BANXX:1:1101:6737:91207 2211 chr6 144444722 60 33H59M9H = 144444728 59 ACACACATATACACAGTGCTAACTTCATTGTACACACATATACACAGTGCTAAGTTCAT FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF NM:i:2 MD:Z:22G30C5 AS:i:49 XS:i:0 SA:Z:chr6,144444720,+,61M40S,60,1;
HWI-C00135:237:CAR2BANXX:1:1101:6737:91207 83 chr6 144444728 11 53M48S = 144444720 -61 ATATACACAGTGCTAAGTTCATTGTACACACATATACACAGTGCTAACTTCATTGTACACACATATACACAGTGCTAAGTTCATTGTACACACATATACAC FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFBBBBB NM:i:0 MD:Z:53 AS:i:53 XS:i:49 SA:Z:chr6,144444722,-,56S45M,11,0;
The 2nd and 3rd alignment with "2131" and "2211" flags are as "supplementary" alignments, however, they are fragments of the other two full-length alignments. I didn't find any reads with "secondary" flags in my results, but all alignments with "supplementary" flags I checked are cases like what I show above.
Can anyone help explain this? Should I remove these "supplementary" alignments to keep uniquely mapped reads? Thanks very much.