Question: bowtie2 '-a' does not return all alignments
0
gravatar for shim
4 weeks ago by
shim10
shim10 wrote:

Hi I have an amplicons library with a few hundred targets that are similar, I expect no gaps, and for simplicity I expect no mismatches (so I penalize).

I use bowtie2 2.4.1 with the following command:

bowtie2 -a -D 100 -R 5 --local --no-unal --gbar 800 --ma 2 --mp 50,50 --np 50 --rdg 50,30 --rfg 50,50 --threads 1 -x ../library/index/libidx -U /scratch/file.fastq -S /scratch/out.sam

I look at alignments with NM:i:0, the problem is that '-a' doesn't return all possible alignments (I tried with and without '-D', '-R' options). When I look manually (simple search w/o MM) for some aligned sequence that it finds, I see there are many more alignments that it missed, sometimes there is only 1 alignment listed with higher score and many alignments with lower score. I know that bowtie2 wasn't designed for these situations in terms of performance, but so far it was fast enough for me.

Is it a well known problem? is there an undocumented limitation?

in the example SAM output below (single read, not all alignments, NM:i:0 alignments, no header, sequences+QS columns removed), all reads should be mapped to WT_Reference (only the WT was sequenced in this case), but there are many other alignments Variant_20, with CIGAR=388M197S, is the best alignment with NM:i:0, it is found 21 times (including WT_Reference) but is listed only once. The library size is 205 sequences (WT_Reference + Variant_1...Variant_204)

thanks, Shim

@PG ID:bowtie2  PN:bowtie2  VN:2.4.1    "CL:""/Tools/Bowtie/bowtie2-2.4.1-linux-x86_64/bowtie2-align-s --wrapper basic-0 -a -D 100 -R 5 --local --gbar 800 --ma 2 --mp 50,50 --np 50 --rdg 50,30 --rfg 50,50 --threads 1 -x ../library/index/libidx --passthrough -U /scratch/file.fastq"""                                                 
M01015:88:000000000-J5K85:1:1101:12543:1681 256 Variant_20: 602 255 388M197S    *   0   0   AS:i:776    XS:i:910    XN:i:0  XM:i:0  XO:i:0  XG:i:0  NM:i:0  MD:Z:388    YT:Z:UU
M01015:88:000000000-J5K85:1:1101:12543:1681 256 Variant_25: 602 255 161M424S    *   0   0   AS:i:322    XS:i:910    XN:i:0  XM:i:0  XO:i:0  XG:i:0  NM:i:0  MD:Z:161    YT:Z:UU
M01015:88:000000000-J5K85:1:1101:12543:1681 256 Variant_26: 602 255 160M425S    *   0   0   AS:i:320    XS:i:910    XN:i:0  XM:i:0  XO:i:0  XG:i:0  NM:i:0  MD:Z:160    YT:Z:UU
M01015:88:000000000-J5K85:1:1101:12543:1681 256 Variant_21: 602 255 137M448S    *   0   0   AS:i:274    XS:i:910    XN:i:0  XM:i:0  XO:i:0  XG:i:0  NM:i:0  MD:Z:137    YT:Z:UU
M01015:88:000000000-J5K85:1:1101:12543:1681 256 Variant_77: 602 255 79M506S *   0   0   AS:i:158    XS:i:910    XN:i:0  XM:i:0  XO:i:0  XG:i:0  NM:i:0  MD:Z:79 YT:Z:UU
M01015:88:000000000-J5K85:1:1101:12543:1681 256 Variant_76: 602 255 79M506S *   0   0   AS:i:158    XS:i:910    XN:i:0  XM:i:0  XO:i:0  XG:i:0  NM:i:0  MD:Z:79 YT:Z:UU
M01015:88:000000000-J5K85:1:1101:12543:1681 256 Variant_72: 602 255 79M506S *   0   0   AS:i:158    XS:i:910    XN:i:0  XM:i:0  XO:i:0  XG:i:0  NM:i:0  MD:Z:79 YT:Z:UU
M01015:88:000000000-J5K85:1:1101:12543:1681 256 Variant_74: 602 255 79M506S *   0   0   AS:i:158    XS:i:910    XN:i:0  XM:i:0  XO:i:0  XG:i:0  NM:i:0  MD:Z:79 YT:Z:UU
M01015:88:000000000-J5K85:1:1101:12543:1681 256 Variant_78: 602 255 79M506S *   0   0   AS:i:158    XS:i:910    XN:i:0  XM:i:0  XO:i:0  XG:i:0  NM:i:0  MD:Z:79 YT:Z:UU
M01015:88:000000000-J5K85:1:1101:12543:1681 256 Variant_75: 602 255 79M506S *   0   0   AS:i:158    XS:i:910    XN:i:0  XM:i:0  XO:i:0  XG:i:0  NM:i:0  MD:Z:79 YT:Z:UU
M01015:88:000000000-J5K85:1:1101:12543:1681 256 Variant_73: 602 255 79M506S *   0   0   AS:i:158    XS:i:910    XN:i:0  XM:i:0  XO:i:0  XG:i:0  NM:i:0  MD:Z:79 YT:Z:UU
M01015:88:000000000-J5K85:1:1101:12543:1681 256 Variant_172:    602 255 68M517S *   0   0   AS:i:136    XS:i:910    XN:i:0  XM:i:0  XO:i:0  XG:i:0  NM:i:0  MD:Z:68 YT:Z:UU
M01015:88:000000000-J5K85:1:1101:12543:1681 256 Variant_175:    602 255 68M517S *   0   0   AS:i:136    XS:i:910    XN:i:0  XM:i:0  XO:i:0  XG:i:0  NM:i:0  MD:Z:68 YT:Z:UU
M01015:88:000000000-J5K85:1:1101:12543:1681 256 Variant_136:    602 255 68M517S *   0   0   AS:i:136    XS:i:910    XN:i:0  XM:i:0  XO:i:0  XG:i:0  NM:i:0  MD:Z:68 YT:Z:UU
M01015:88:000000000-J5K85:1:1101:12543:1681 256 Variant_170:    602 255 68M517S *   0   0   AS:i:136    XS:i:910    XN:i:0  XM:i:0  XO:i:0  XG:i:0  NM:i:0  MD:Z:68 YT:Z:UU
M01015:88:000000000-J5K85:1:1101:12543:1681 256 Variant_171:    602 255 68M517S *   0   0   AS:i:136    XS:i:910    XN:i:0  XM:i:0  XO:i:0  XG:i:0  NM:i:0  MD:Z:68 YT:Z:UU
M01015:88:000000000-J5K85:1:1101:12543:1681 256 Variant_135:    602 255 68M517S *   0   0   AS:i:136    XS:i:910    XN:i:0  XM:i:0  XO:i:0  XG:i:0  NM:i:0  MD:Z:68 YT:Z:UU
M01015:88:000000000-J5K85:1:1101:12543:1681 256 Variant_139:    602 255 68M517S *   0   0   AS:i:136    XS:i:910    XN:i:0  XM:i:0  XO:i:0  XG:i:0  NM:i:0  MD:Z:68 YT:Z:UU
M01015:88:000000000-J5K85:1:1101:12543:1681 256 Variant_140:    602 255 68M517S *   0   0   AS:i:136    XS:i:910    XN:i:0  XM:i:0  XO:i:0  XG:i:0  NM:i:0  MD:Z:68 YT:Z:UU
M01015:88:000000000-J5K85:1:1101:12543:1681 256 Variant_173:    602 255 68M517S *   0   0   AS:i:136    XS:i:910    XN:i:0  XM:i:0  XO:i:0  XG:i:0  NM:i:0  MD:Z:68 YT:Z:UU
M01015:88:000000000-J5K85:1:1101:12543:1681 256 Variant_176:    602 255 68M517S *   0   0   AS:i:136    XS:i:910    XN:i:0  XM:i:0  XO:i:0  XG:i:0  NM:i:0  MD:Z:68 YT:Z:UU
M01015:88:000000000-J5K85:1:1101:12543:1681 256 Variant_137:    602 255 68M517S *   0   0   AS:i:136    XS:i:910    XN:i:0  XM:i:0  XO:i:0  XG:i:0  NM:i:0  MD:Z:68 YT:Z:UU
M01015:88:000000000-J5K85:1:1101:12543:1681 256 Variant_174:    602 255 68M517S *   0   0   AS:i:136    XS:i:910    XN:i:0  XM:i:0  XO:i:0  XG:i:0  NM:i:0  MD:Z:68 YT:Z:UU
M01015:88:000000000-J5K85:1:1101:12543:1681 256 Variant_138:    602 255 68M517S *   0   0   AS:i:136    XS:i:910    XN:i:0  XM:i:0  XO:i:0  XG:i:0  NM:i:0  MD:Z:68 YT:Z:UU
M01015:88:000000000-J5K85:1:1101:12543:1681 256 Variant_141:    602 255 68M517S *   0   0   AS:i:136    XS:i:910    XN:i:0  XM:i:0  XO:i:0  XG:i:0  NM:i:0  MD:Z:68 YT:Z:UU
M01015:88:000000000-J5K85:1:1101:12543:1681 256 Variant_185:    602 255 67M518S *   0   0   AS:i:134    XS:i:910    XN:i:0  XM:i:0  XO:i:0  XG:i:0  NM:i:0  MD:Z:67 YT:Z:UU
...
score sam alignment bowtie2 • 105 views
ADD COMMENTlink modified 4 weeks ago • written 4 weeks ago by shim10

Short read aligners were never designed to be mathematically optimal or to produce all equally scoring alignments. Thus I would not recommend using that for this type of use cases.

ADD REPLYlink written 4 weeks ago by Istvan Albert ♦♦ 84k

Thank you Istvan, I'm aware short read aligners don't produce all possible alignments, but I do expect a high coverage of the best alignments (e.g. > 95%; and missing 20/21 of a particular score' alignments is a problem). Often there are "Effort options" (like bowtie2's '-D', '-R') that should produce more results probably at a significant performance cost, but, as I mentioned, that didn't help.

Do you know a tool that better suits my use case? i.e. mapping many fastq short reads to < 300 variants that are similar, it would be best if it output all the best-rank-alignments (similar to bowtie1's -a --best --strata).

PS my analysis is somewhat similar to microbiome analysis, but so far I found only mothur tool (Schloss, et al., 2009) which I'm not sure suits my pipeline..

best, shim

ADD REPLYlink written 4 weeks ago by shim10

Try bbmap.sh as an alternate. It has a ambig=all option that should report all possible alignments for a read. There is a guide available. There are many options available on command line. You will need to spend some time experimenting.

ADD REPLYlink written 4 weeks ago by genomax87k

Thank you very much genomax, I will try bbmap (I saw it in microbiome pipeline somewhere but never used it)

ADD REPLYlink written 4 weeks ago by shim10
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1559 users visited in the last hour