The pattern matching tool offered by the Saccharomyces Genome Database (SGD) and other genome sites has PatMatch as the basis.
The Saccharomyces Genome Database (SGD) has a nice, concise guide to the syntax for PatMatch patterns . PatMatch patterns allow use of N
or X
or .
as any residue or base, and thus are more familiar to biologists than regular expressions. PatMatch allows use of IUPAC ambiguity codes.
You can run the PatMatch software yourself and I have a Github repository where you can easily launch environments served via the MyBinder.org service with PatMatch already installed . The launched sessions include several notebooks demonstrating how to use it with any genome sequence you can provide, as well as how to combine PatMatch results with Python for downstream analysis. Go to my patmatch-binder repo, click on the launch binder
badge, and work through the Jupyter notebooks once the session launches.
You can also run the software on CyVerse in their VICE offerings.