Question: Bowtie2 classification of discordantly mapped pairs
1
gravatar for nthe
3.1 years ago by
nthe30
United States
nthe30 wrote:

Hi,

have mapped genomic Illumina 2x125bp reads to a reference transcriptome with bowtie2 in --very-sensitive-local mode. Mostly, the output makes sense, but I'm confused about why bowtie2 is calling some of my mapped read pairs discordant (based on the bit flag and the YT:Z:DP tag). A few examples from the sam output:

HWI-D00550:274:C77P6ANXX:5:1310:7220:87933    161    Contig138    157    42    45S80M    =    281    294    TATATATACACACGCATGTAATATTTTTTAGGTCTAAAAATAAAGTGAGAGATGGTCCAAGAAAAGAAACTTAATGTAATGTTAACGTCTTCATTTATTAAAACTGGGACAATGACAGTTTACTT    BBBC@GEGGGGEGGGGGGGGGD@:DFCDEG1BF1C1<ef>GCDGGGF10=C=BBDFF:0=E00@@FGBGBG@FCCGG@CC@@@BGG///8/;/BF?GGEGEEG/F/C/>/:>G/77/BF//B//9    AS:i:141    XN:i:0    XM:i:3    XO:i:0    XG:i:0    NM:i:3    MD:Z:19G21T30A7    YS:i:245    YT:Z:DP    RG:Z:11675X173
HWI-D00550:274:C77P6ANXX:5:1310:7220:87933    81    Contig138    281    42    125M    =    157    -294    TCTGAACACAATATATTAAAACCAGTCTTAACATGAAAAATGCACTTGTGTTGTATCAAAATGTTTTCATACAGATGTGGCCGCAGGCAGATGTGCACAACAGCTGCCAAAACATGGATTTTTTT    9=5/GGCFCGGEGGGGGGGC:GEEEGCGF@GGGGGGCGGFEGGEGGGF@FCFFEGBEGGGDBDGECF1GGFEGGGGGGGGGGGGGGGFGGGGGGGFDGGGGGGGGGGGGGGGGGGGFGGEA03BA    AS:i:245    XN:i:0    XM:i:1    XO:i:0    XG:i:0    NM:i:1    MD:Z:121A3    YS:i:141    YT:Z:DP    RG:Z:11675X173
HWI-D00550:274:C77P6ANXX:5:2306:1952:38805    161    Contig245    1181    36    46S45M    =    1304    281    GTTGTATGTAGAAAAGAGAAACACACCTAATATGTTTGGCTATTACCTTTTTCTCCAGCTTCATGTCCAGCCTCAGATCGATGAAAATGGG    BBBBBGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGEDGGC0=EFB@GGGGCE>FDFEGG0:0BB>FGFGDFGG>    AS:i:83    XN:i:0    XM:i:1    XO:i:0    XG:i:0    NM:i:1    MD:Z:36AYS:i:204    YT:Z:DP    RG:Z:11675X173    
HWI-D00550:274:C77P6ANXX:5:2306:1952:38805    81    Contig245    1304    36    112M13S    =    1181    -281    GGGATAGTACCTCATTATGTACACGTCATCTACAGGCGCCCAAGCCGTCTTCCCAAAGGGCTTGTGTCTGTCTGTGTCATCAATGACCTTCTTTTCTTTCTTTTCTTTCTTTTCTTTCTTTGCAT    GGG==B@B6-A9/=CGFF/4.FEG=GGGGGGGGGGAGFGFGGGA>FGGGGGGGEGEDFCFGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGEGGGGGGGGGGGGGGGGEEB>@=B    AS:i:204    XN:i:0    XM:i:3    XO:i:0    XG:i:0    NM:i:3    MD:Z:24A13T30T42    YS:i:83    YT:Z:DP    RG:Z:11675X173


For both of these pairs, the mapping quality for both mates was high (>36), the inferred fragment length is greater than the minimum of 250 I had specified in the -I parameter (I also tried mapping with -I 0 and still saw these pairs classified as discordant), and the read pair orientation indicated by the leftmost mapping positions of the mates mapping to the forward and reverse strands is similar to what I see for read pairs that bowtie2 classifies as concordantly mapped (so should be forward-reverse). 

I would really appreciate it if someone can help point out what I'm missing here and why these reads should not be considered mapped as a proper pair.

Thanks very much!

 

                                         

 



 

ADD COMMENTlink modified 2.1 years ago by brentp22k • written 3.1 years ago by nthe30
3

I think it's because your actual mapped DNA region is less than 250 for both.

HWI-D00550:274:C77P6ANXX:5:1310:7220:87933    81    Contig138    281    42    125M    =    157    -294 
So it starts at 281, goes for 125bp, and the start of the mate is -294 backwards. This would put it at 112.

But the mate doesn't start at 112, its POS (of first mapped position) is 157. Thats because there's 45bp of softclipped stuff before it. 112+45 = 157.

So how much is actually mapped?
(281+ 125M) - 157 = 249bp

The other pair is 235bp

Maybe that explains it? I dont really know much about Bowtie2 and why it does the things it does... 

ADD REPLYlink modified 3.1 years ago • written 3.1 years ago by John12k
1

Thank you very much for pointing that out, John! That makes sense. I thought the softclipped parts would be included in the estimation of fragment sizes, but I can see the logic in not doing that. I really appreciate your help!

ADD REPLYlink written 3.1 years ago by nthe30

No worries, if it's any consolation, I spent about 20 minutes looking at read flags, Bowtie2 tags, phred scores -  even drawing little pictures of read mapping orientation - before figuring out TLEN might not be the distance from the first mapped base to the last :P They don't make it easy for us eh!! :)

ADD REPLYlink written 3.1 years ago by John12k

can u try with --no-discordant without mentioning -l and see what happens to the reads ? May be u can subset few reads which includes the above mentioned reads. I could also see the reads are soft clipped, and I don't know if it effects the insert size calculation.

ADD REPLYlink modified 3.1 years ago • written 3.1 years ago by geek_y8.8k
1

Thanks for this suggestion. I thought that I had already done this, but I must have made a mistake because now when I tried mapping again without specifying the -I parameter (so using the default of 0), bowtie2 classifies these two read pairs as concordant. So the softclipped parts of the reads must have been what caused the confusion, as John suggested. Thanks!

ADD REPLYlink written 3.1 years ago by nthe30

Glad to know. :-) 

ADD REPLYlink written 3.1 years ago by geek_y8.8k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 770 users visited in the last hour