Question

Mapping reads to repetitive positions in reference within edit distance and with 1 indel

0

Entering edit mode

4.4 years ago

pawlowac ▴ 80

I have a list of subsequences between 30 - 40 nt that I need to map to a reference. I expect these reads to align to several positions in the reference. I need to find all exact alignments, any alignments within a given edit distance, and also any positions that are off by a single indel.

I can do all that is required with bowtie1 except the indel.

BBmap looks like it should work for this purpose, but I can't seem to find a single indel using the following;

bbmapskinner.sh  in=kmers.fasta out=result.sam ambiguous=all strictmaxindel=1

Note: I haven't tried to catch all alignments within a given edit distance yet, but the sequence is exactly as in the reference but with a single deletion or insertion.

My issue overall is that I have many hundreds of subsequences that I need to map to ~800k relatively homogeneous sequences (rRNA) and this takes a very long time normally with Bowtie and a second step using the regex module in python.

Two related questions; 1. What am I doing wrong with BBmap? 2. Is there a better and faster way to do this? Note, I'm looking at edit distances ~25% of the 30 - 40 mer subsequences. I haven't strongly considered BLASTn because it doesn't perform well with short sequences, but maybe that is the best course of action here?

Additional information: The second pass with regex is because I need to take positions of mismatches into account. Bowtie does a good job of that, but I understand that bowtie is not guaranteed to find all mapping positions and I need to know exactly which reference sequences produce a match.

bbmap mapping bowtie • 753 views

ADD COMMENT • link 4.4 years ago by pawlowac ▴ 80