I have a collection of gene sequences, for each I would like to identify any highly similar 150bp regions within a specific assembled genome.
I have initially tried megablast, where I blasted each query gene against the genome and looked for regions with a sequence identity > 85%. However, it became apparent that even if this worked for most genes, it may sometimes prove that an aligned region of 200bp may have an identity of 84% to my query gene, whereas in reality had the alignment been shortened by 40bp, the identity would rise above 85%.
Tweaking the blast parameters such as gap penalties and the like is not a perfect solution, and searching biostars for this particular problem proved difficult.
Do you have any suggestions on how to proceed?