Bowtie2 classification of discordantly mapped pairs
0
1
Entering edit mode
7.2 years ago
nthe ▴ 30

Hi,

I have mapped genomic Illumina 2x125bp reads to a reference transcriptome with bowtie2 in --very-sensitive-local mode. Mostly, the output makes sense, but I'm confused about why bowtie2 is calling some of my mapped read pairs discordant (based on the bit flag and the YT:Z:DP tag). A few examples from the sam output:

HWI-D00550:274:C77P6ANXX:5:1310:7220:87933    161    Contig138    157    42    45S80M    =    281    294    TATATATACACACGCATGTAATATTTTTTAGGTCTAAAAATAAAGTGAGAGATGGTCCAAGAAAAGAAACTTAATGTAATGTTAACGTCTTCATTTATTAAAACTGGGACAATGACAGTTTACTT    BBBC@GEGGGGEGGGGGGGGGD@:DFCDEG1BF1C1<EF>GCDGGGF10=C=BBDFF:0=E00@@FGBGBG@FCCGG@CC@@@BGG///8/;/BF?GGEGEEG/F/C/>/:>G/77/BF//B//9    AS:i:141    XN:i:0    XM:i:3    XO:i:0    XG:i:0    NM:i:3    MD:Z:19G21T30A7    YS:i:245    YT:Z:DP    RG:Z:11675X173
HWI-D00550:274:C77P6ANXX:5:1310:7220:87933    81    Contig138    281    42    125M    =    157    -294    TCTGAACACAATATATTAAAACCAGTCTTAACATGAAAAATGCACTTGTGTTGTATCAAAATGTTTTCATACAGATGTGGCCGCAGGCAGATGTGCACAACAGCTGCCAAAACATGGATTTTTTT    9=5/GGCFCGGEGGGGGGGC:GEEEGCGF@GGGGGGCGGFEGGEGGGF@FCFFEGBEGGGDBDGECF1GGFEGGGGGGGGGGGGGGGFGGGGGGGFDGGGGGGGGGGGGGGGGGGGFGGEA03BA    AS:i:245    XN:i:0    XM:i:1    XO:i:0    XG:i:0    NM:i:1    MD:Z:121A3    YS:i:141    YT:Z:DP    RG:Z:11675X173
HWI-D00550:274:C77P6ANXX:5:2306:1952:38805    161    Contig245    1181    36    46S45M    =    1304    281    GTTGTATGTAGAAAAGAGAAACACACCTAATATGTTTGGCTATTACCTTTTTCTCCAGCTTCATGTCCAGCCTCAGATCGATGAAAATGGG    BBBBBGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGEDGGC0=EFB@GGGGCE>FDFEGG0:0BB>FGFGDFGG>    AS:i:83    XN:i:0    XM:i:1    XO:i:0    XG:i:0    NM:i:1    MD:Z:36AYS:i:204    YT:Z:DP    RG:Z:11675X173
HWI-D00550:274:C77P6ANXX:5:2306:1952:38805    81    Contig245    1304    36    112M13S    =    1181    -281    GGGATAGTACCTCATTATGTACACGTCATCTACAGGCGCCCAAGCCGTCTTCCCAAAGGGCTTGTGTCTGTCTGTGTCATCAATGACCTTCTTTTCTTTCTTTTCTTTCTTTTCTTTCTTTGCAT    GGG==B@B6-A9/=CGFF/4.FEG=GGGGGGGGGGAGFGFGGGA>FGGGGGGGEGEDFCFGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGEGGGGGGGGGGGGGGGGEEB>@=B    AS:i:204    XN:i:0    XM:i:3    XO:i:0    XG:i:0    NM:i:3    MD:Z:24A13T30T42    YS:i:83    YT:Z:DP    RG:Z:11675X173


For both of these pairs, the mapping quality for both mates was high (>36), the inferred fragment length is greater than the minimum of 250 I had specified in the -I parameter (I also tried mapping with -I 0 and still saw these pairs classified as discordant), and the read pair orientation indicated by the leftmost mapping positions of the mates mapping to the forward and reverse strands is similar to what I see for read pairs that bowtie2 classifies as concordantly mapped (so should be forward-reverse).

I would really appreciate it if someone can help point out what I'm missing here and why these reads should not be considered mapped as a proper pair.

Thanks very much!

discordant-pairs bowtie2 alignment • 3.0k views
3
Entering edit mode

I think it's because your actual mapped DNA region is less than 250 for both.

HWI-D00550:274:C77P6ANXX:5:1310:7220:87933    81    Contig138    281    42    125M    =    157    -294


So it starts at 281, goes for 125bp, and the start of the mate is -294 backwards. This would put it at 112.

But the mate doesn't start at 112, its POS (of first mapped position) is 157. That's because there's 45bp of soft-clipped stuff before it. 112+45 = 157.

So how much is actually mapped?

(281+125M) - 157 = 249bp

The other pair is 235bp

Maybe that explains it? I don't really know much about Bowtie2 and why it does the things it does.

1
Entering edit mode

Thank you very much for pointing that out, John! That makes sense. I thought the softclipped parts would be included in the estimation of fragment sizes, but I can see the logic in not doing that. I really appreciate your help!

0
Entering edit mode

No worries, if it's any consolation, I spent about 20 minutes looking at read flags, Bowtie2 tags, phred scores - even drawing little pictures of read mapping orientation - before figuring out TLEN might not be the distance from the first mapped base to the last :P They don't make it easy for us eh!! :)

0
Entering edit mode

Can u try with --no-discordant without mentioning -l and see what happens to the reads? Maybe you can subset few reads which includes the above mentioned reads. I could also see the reads are soft clipped, and I don't know if it effects the insert size calculation.

1
Entering edit mode

Thanks for this suggestion. I thought that I had already done this, but I must have made a mistake because now when I tried mapping again without specifying the -I parameter (so using the default of 0), bowtie2 classifies these two read pairs as concordant. So the softclipped parts of the reads must have been what caused the confusion, as John suggested. Thanks!

0
Entering edit mode

Glad to know. :-)