Is anyone aware of any tool that is able to perform error-tolerant pattern-matching search on protein FASTA files?
This is a duplicate of my question from bioinformatics.stackexchange.
For example, I want to know, which proteins in my fasta file to match
ADNG..C.G regexp (which represents
ADNGCG pattern). However, I want to be tolerant to matching errors, meaning that I'm good with any protein that differs in any 1 letter in the motif:
MDNG..C.G etc are all good. Running all possible variants through grep is possible, but exponentially long for longer patterns (I usually have 15-20 letters in pattern, and scan
I am aware of the tool
agrep (docs). As far as I know, it's not supported anymore, and also does not distinguish letters in which it's allowed to make error. Also, it does not support long enough patterns (with more than 9 errors -- and yes, I tried to recompile it with myself from here and could not get it work). Also, it's not designed for proteins specifically.