I have a long list of very short sequences (6-10nt long) that have to be aligned against a very small part of a mammalian genome (2-5Mb).
The region is not continuous but derived from a few thousand smaller fragments. The index file/database is created as the collection of these individual fragments (not concatenated).
The problem is that mismatches + indels (up to two per sequence) are expected and I want to keep all the multimapped positions.
Tha aligner should be quite fast and as exhaustive as possbile.
Already tried bwa, novoalign, blast, gem, bowtie2 and soap + dynamic programming (which works but is very slow).
All the tested aligners have different problems with this task.
Until now bowtie2 seems to be more thorough but takes significant time to run in local mode.
Any suggestions? Since the sequences are quite short the aligner does not have to be derived only from the NGS cosmos.
Some further info based on the received comments:
The sequences are binding conformations and the index sequences are bona fide binding sites. Therefore there are no false positives and we don't expect unique alignments for each sequence.
The optimal aligner should return results residing in the best stratum: i.e. if there are 50 matches with no mismatches then there is no need to report results with e.g. 2 mismatches and an indel.
The number of returned results is not a problem but is actually desired. Therefore the aligner should report all multimaps belonging to the same stratum.
The identified sites will be filtered based on site properties and the true sites can be identified. However, you need the candidate regions to do the filtering.
We have found aligners that can partly solve the problem but some do not report all multimaps, others have issues with indels or are too slow.
We have started coding an in-house aligner for the task but it would save time if there is something ready out there.
--Edit2 based on comments
I would like to thank fellow biostars for trying to point us in another way and help us avoid a mistake or loss of time.
However what has been asked is exactly what we're looking for: "Does anyone know of a (fast) aligner that can handle (many) multimaps and indels for short sequences".