Matching Strings With Mismatches
4
4
Entering edit mode
11.0 years ago
Krisr ▴ 470

I am using perl to match short nucleotide sequences against fasta sequences...

(GeneFasta =~ /searchSeq/g) I would like to perform this match, but allow for a mismatch in the search. Does anyone know if, and how, perl may accomplish this? perl sequence • 14k views ADD COMMENT 8 Entering edit mode this is a bad idea. Why don't you use a short reads aligner instead? ADD REPLY 7 Entering edit mode 11.0 years ago The Bio::Grep module is pretty good as it provides a common interface for you to interact with several different fuzzy matchers, my favorite being Vmatch ADD COMMENT 6 Entering edit mode 11.0 years ago agrep (i.e., approximate grep) is a nice tool for this sort of thing. it's not a standard LINUX tool, but it is a good one. Here's one implementation: ftp://ftp.cs.arizona.edu/agrep/ from the README at the above URL: " ...for example, "agrep -2 homogenos foo" will find homogeneous as well as any other word that can be obtained from homogenos with at most 2 substitutions, insertions, or deletions. " ADD COMMENT 0 Entering edit mode Thanks. I'm impressed by the quality of this tool. ADD REPLY 0 Entering edit mode Yeah, believe it not, 3 years ago I hacked it briefly as a short-read aligner. ADD REPLY 5 Entering edit mode 11.0 years ago Rm 8.1k You are looking for a fuzzy pattern matching program, try perl module String::Approx: "Perl extension for approximate matching (fuzzy matching)" For fuzzy pattern matching excercise and scripts go through VCU bioinformatics notes on pattern matching ADD COMMENT 1 Entering edit mode I've had some issues with that module - both false positives and misses. ADD REPLY 1 Entering edit mode 11.0 years ago Just assigning a regexp to a scalar will not work in perl for sub-sequence pattern matches e.g. searchSeq = "AAA[TA]";


Instead you need to use quote regular expression (qr) operator

\$searchSeq = qr/AAA[TA]/;