I am working with sequencing data from a transposon mutagenesis library. I want to identify the exact site of transposon insertion into a plasmid (i.e. 10 kb genome). I have many millions of 100 bp unpaired reads off the plasmid. I have code that aligns to the plasmid and the transposon sequence and performs some DNA math on reads matching both sequences.
I have tried a number of different aligners - BLAT, bowtie2, subalign - and all of them fail to identify 100% of true positive insertions in a synthetically generated library of reads with 20 bp of homology both to plasmid and transposon. The best I can get is about 92% recall. Is there an aligner out there designed for this task? Identification of perfect or near perfect 15-20 bp matches within short reads? It is important that the software be able to find matches that begin in the middle of the read, of course.
Thanks for your help!