Difference between chimeric alignments and multiple mapping
1
7
Entering edit mode
6.9 years ago
Vanilla ▴ 110

Hi all:

I recently got quite confused with two SAM flags got from BWA alignment, which is "supplementary alignment" from chimeric alignments and "not primary alignment" (or "secondary alignment") from multiple mapping.

What samtools explain about these two flags is: (Refer to https://samtools.github.io/hts-specs/SAMv1.pdf)

A chimeric alignment is primarily caused by structural variations, gene fusions, misassemblies, RNA-seq or experimental protocols. It is more frequent given longer reads. For a chimeric alignment, the linear alignments consisting of the alignment are largely non-overlapping. Typically, one of the linear alignments in a chimeric alignment is considered the "representative" alignment, and the others are called "supplementary" and are distinguished by the supplementary alignment flag.

In contrast, multiple mappings are caused primarily by repeats. They are less frequent given longer reads. If a read has multiple mappings, all these mappings are almost entirely overlapping with each other. In multiple mapping, One of these alignments is considered "primary". All the other alignments have the "secondary" alignment flag set in the SAM records that represent them.

However, I found in my ChIP-seq alignment results got from BWA(without -M option), alignments with "supplementary" flags are with overlaps with "representative" alignments,which I think should be "secondary" alignments as described. For example, I got four alignments for one pair of reads:

HWI-C00135:237:CAR2BANXX:1:1101:6737:91207 163 chr6 144444720 60 61M40S = 144444728 61 GTACACACATATACACAGTGCTAAGTTCATTGTACACACATATACACAGTGCTAACTTCATTGTACACACATATACACAGTGCTAAGTTCATTGTACACAC BBBBBFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF NM:i:1 MD:Z:1C59 AS:i:59 XS:i:0 SA:Z:chr6,144444722,+,33S59M9S,60,2;

HWI-C00135:237:CAR2BANXX:1:1101:6737:91207 2131 chr6 144444722 11 56H45M = 144444720 -47 ACACACATATACACAGTGCTAAGTTCATTGTACACACATATACAC FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFBBBBB NM:i:0 MD:Z:45 AS:i:45 XS:i:20 SA:Z:chr6,144444728,-,53M48S,11,0;

HWI-C00135:237:CAR2BANXX:1:1101:6737:91207 2211 chr6 144444722 60 33H59M9H = 144444728 59 ACACACATATACACAGTGCTAACTTCATTGTACACACATATACACAGTGCTAAGTTCAT FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF NM:i:2 MD:Z:22G30C5 AS:i:49 XS:i:0 SA:Z:chr6,144444720,+,61M40S,60,1;

HWI-C00135:237:CAR2BANXX:1:1101:6737:91207 83 chr6 144444728 11 53M48S = 144444720 -61 ATATACACAGTGCTAAGTTCATTGTACACACATATACACAGTGCTAACTTCATTGTACACACATATACACAGTGCTAAGTTCATTGTACACACATATACAC FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFBBBBB NM:i:0 MD:Z:53 AS:i:53 XS:i:49 SA:Z:chr6,144444722,-,56S45M,11,0;

The 2nd and 3rd alignment with "2131" and "2211" flags are as "supplementary" alignments, however, they are fragments of the other two full-length alignments. I didn't find any reads with "secondary" flags in my results, but all alignments with "supplementary" flags I checked are cases like what I show above.

Can anyone help explain this? Should I remove these "supplementary" alignments to keep uniquely mapped reads? Thanks very much.

Best, Vanilla

sequencing alignment sam samtools sam flag • 6.4k views
ADD COMMENT
6
Entering edit mode
6.9 years ago

Those are proper supplemental alignments, since there's no way to map the entire reads in a biologically coherent manner. A secondary alignment would occur if those reads mapped elsewhere in the genome (i.e., not overlapping like this).

Anyway, yes, you can go ahead and remove those for most downstream applications. One caveat to this is that you might have a structural variation there (tandem repeat).

ADD COMMENT
0
Entering edit mode

Thanks Devon!

I agree as just checked that this is a repeat "ACACACATATACACAGTGCTAAGTTCATTGT" around chr6:144444722 on mm10, which causes the multiple alignments for the reads.

Oh I also have a lot of cases with supplementary alignments mapped elsewhere, with sequence overlaps but not genome coordinates overlap, like following(the first alignment with "2227" flag is as supplementary alignment):

HWI-C00135:237:CAR2BANXX:1:1203:17082:84650 2227    chr1    3236776 2   7H19M8I34M33H   chr11   93312384    0   AAAAACAAAGAGAGGGAGACGGAGTGTGAGGGAGAGAGAGAGAGAGAGAGAGAGCGAGAGA   ///<B//<F//B<</<////////<//<///<//////FFFFFFF///F<//<///B</</   NM:i:9  MD:Z:46T6   AS:i:34 XS:i:31 SA:Z:chr11,93312195,+,63M38S,53,2;

HWI-C00135:237:CAR2BANXX:1:1203:17082:84650 163 chr11   93312195    53  63M38S  =   93312384    290 CTTCACTCTCTGAGGAAACTGTTAACACAACGGTCTCTCGCTCTCTCTCTCTCTCTCTCTCTCCCTCACACTCCGTCTCCCTCTCTTTGTTTTTCTTTTGT   BBB/BFF/B<//<</FBBFFFFFFB/<BB</77/</<B///<//<F///FFFFFFF//////<///<//<////////</<<B//F<//B<//////////   NM:i:2  MD:Z:32C6T23AS:i:53 XS:i:34 SA:Z:chr1,3236776,-,7S19M8I34M33S,2,9;

HWI-C00135:237:CAR2BANXX:1:1203:17082:84650 83  chr11   93312384    60  101M    =   93312195    -290    CTACAGTTGCAATCCATTTGCACATCTTCTTTCTGAAGACAAATCTGCATGTACTTTCTGTGTATCCTTGTATCAAGTGAGCCTATGGTGCTATAGTACTT   B//B///FFFFFB///<<<FBF/FFFFFFFFFFFBF<FBFB</BFF<F/FFFFBBFBF<F/<FFB<<FFF/FFFFFFFFFFF/FFFFBFFFFFFBB/BB//   NM:i:0  MD:Z:101    AS:i:101    XS:i:0

What do you think about such cases? Thanks!

ADD REPLY
0
Entering edit mode

Supplemental alignments are for any case where subsets of a read can be aligned in biologically impossible ways, so that makes sense. I wouldn't bother even considering the supplemental alignment.

ADD REPLY
0
Entering edit mode

Got it. So I will also just throw them away.

In case you also have secondary alignments from multiple mapping(e.g. due to repeat sequences), should them also be discarded for ChIP-seq analysis? I'm also curious why I didn't get any of secondary alignments.

ADD REPLY
0
Entering edit mode

Most aligners default to not returning secondary alignments by default. Yes, generally they get ignored (though not always, in case you're interested in something overlapping repeats).

ADD REPLY
0
Entering edit mode

Got it. Thanks very much for your help Devon!

ADD REPLY

Login before adding your answer.

Traffic: 1407 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6