Question: How To Find The Locations Of A Short Specific Sequence In A Genome With 1 Or 2 Mismatches Allowed?
gravatar for William
5.4 years ago by
William4.4k wrote:

We have a 23 nucleotide CRISPR target sequence of which I would like to find out if it also present in other locations in the genome.

The sequences directs a CRISPR RNA construct to introduce a indel mutation in the genome and we would like to make sure that there is only one target loci. There is also one N in the nucleotide sequence.

Let's say the 23 nucleotide sequence is :


How do I find all the loci in a genome were this sequence matches, exactly (well 1 mismatch one the N), or with say an edit distance of 2 or 3?

I tried BWA aln with a short sequence of 23 bp from the human genome with parameters -l 23 -k 2 but it didn't find back the location of the 23 bp. Does bwa work with sequences of this lenght?

I tried blast but I get back a lot of results and I can't control the max edit distance.

sequence bwa blast • 3.4k views
ADD COMMENTlink modified 4 days ago by Johan Zicola40 • written 5.4 years ago by William4.4k

PatMatch allows controlling the number of mismatches and whether that includes insertions, deletions, and/or substitutions. There is a stand-alone version of the software available as posted about here in response to a related question. (In fact, at the referenced resource you can run it right in your browser right now via Jupyter environment served by As far as I can tell, it cannot fine-tune specifying how to break down that number further to say 2 substitutions and 1 deletion max.

ADD REPLYlink written 8 months ago by Wayne290

but it looks like PatMatch only works for Arabidopsis

ADD REPLYlink written 4 months ago by chahat_u100

@chahat_u PatMatch definitely isn't limited to Arabidopsis. Look at the other post I pointed at here. There are several web sites offering PatMatch working as a web tool for quite a few organisms beyond Arabidopsis. I list the ones I could find here. Additionally, as long as you have the sequence and go to and launch a binder session there, you can follow along with the example I set up and use another genome.

ADD REPLYlink written 4 months ago by Wayne290
gravatar for Maximilian Haeussler
4.0 years ago by
Maximilian Haeussler1.3k wrote:

Yes, bwa will find it, but you need to change the parameters. Do not use the seeded mode, use the slower -N mode:

bwa aln -n 4 -o 0 -k 4 -N

The sanger CRISPR site uses more or less these parameters.

ADD COMMENTlink written 4.0 years ago by Maximilian Haeussler1.3k

Hi, I tried your method to find the genomic location of a DNA sequence in the hg19 genome, and I ran the following command -

bwa aln -n 4 -o 0 -k 4 -N hg19.fasta testmotif.fq > out.sai

But the out.sai file seemed to only have illegible stuff in it -

SAI  ÄÑÄø ˇˇˇ

Do you have some idea as to what could be going wrong?

ADD REPLYlink written 4 months ago by chahat_u100
gravatar for Johan Zicola
4 days ago by
Johan Zicola40
Johan Zicola40 wrote:

Using bowtie (for example v0.12.7 here) to find off-targets for defined CRISPR-Cas9 target sequences:

Make the bowtie index for your genome (fasta file format)

bowtie-build -f genome.fa  genome_prefix

Search for your target sequence by allowing 1 mismatch (for your N) with the flag -n 1

 bowtie genome_prefix -n 1 -c GGAGCGAGCGGAGCGGTACANGG

It should find back your origin sequence even with 1 mismatch (your N in this case). To allow 2 mismatches, use -n 2, etc. The seed length is 28 by default so you don't need to change that as you work with CRISPR-Cas9 target sequences (typically 20 bp). Check more in bowtie documentation.

Note: I use bowtie here as bowtie2 allows maximum 1 mismatch, which is a drawback in this case.

ADD COMMENTlink written 4 days ago by Johan Zicola40
gravatar for Jeremy Leipzig
5.4 years ago by
Philadelphia, PA
Jeremy Leipzig18k wrote:

vmatch is an excellent general aligner

The Vmatch large scale sequence analysis software

ADD COMMENTlink modified 5.4 years ago by Istvan Albert ♦♦ 78k • written 5.4 years ago by Jeremy Leipzig18k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1618 users visited in the last hour