Dear All,
In my project, I want genes sharing a stretch of at least 100 amino acids, with percent identity > 90 percent with genes of my database. Can I run blast by setting the -perc_identity option as high as I want? I am afraid there would be cases that hit, and a query will align through a stretch of over 500 amino acids with a percentage identity near 60 percent. Yet, there might be a sub-alignment of the proteins with shorter alignment length but high enough percentage identity (for example, alignment length = 150 amino acids, percentage identity > 90 ). In such a case, even though the highest-scoring alignment (500 amino acids, 60 percentage identity) is not of my interest, there is a subalignment precisely like what I am looking for. So, my question is: In such a hypothetical case, if I set -perc_identity to 90, will blast report the hit? Or it misses it because, in the highest-scoring alignment, percentage identity is less than 90?
In case the blast is not suited for my application. Do you suggest an alternative?
Best wishes,
BLAST will return the local alignments that maximize the e-value - you can't force it to align all 100 amino acids of your query sequence in global manner. So the answer is yes, it will return that high scoring local alignment (what you call a "subalignment") because that's what BLAST is for.
blat
may be a better tool for this case. Blat of DNA is designed to quickly find sequences of 95% and greater similarity of length 40 bases or more. So if those limitations work then take a look at LINK.