Question: How To Find The Locations Of A Short Specific Sequence In A Genome With 1 Or 2 Mismatches Allowed?
1
gravatar for William
5.1 years ago by
William4.3k
Europe
William4.3k wrote:

We have a 23 nucleotide CRISPR target sequence of which I would like to find out if it also present in other locations in the genome.

The sequences directs a CRISPR RNA construct to introduce a indel mutation in the genome and we would like to make sure that there is only one target loci. There is also one N in the nucleotide sequence.

Let's say the 23 nucleotide sequence is :

GGAGCGAGCGGAGCGGTACANGG

How do I find all the loci in a genome were this sequence matches, exactly (well 1 mismatch one the N), or with say an edit distance of 2 or 3?

I tried BWA aln with a short sequence of 23 bp from the human genome with parameters -l 23 -k 2 but it didn't find back the location of the 23 bp. Does bwa work with sequences of this lenght?

I tried blast but I get back a lot of results and I can't control the max edit distance.

sequence bwa blast • 3.2k views
ADD COMMENTlink modified 17 months ago by Biostar ♦♦ 20 • written 5.1 years ago by William4.3k

PatMatch allows controlling the number of mismatches and whether that includes insertions, deletions, and/or substitutions. There is a stand-alone version of the software available as posted about here in response to a related question. (In fact, at the referenced resource you can run it right in your browser right now via Jupyter environment served by MyBinder.org.) As far as I can tell, it cannot fine-tune specifying how to break down that number further to say 2 substitutions and 1 deletion max.

ADD REPLYlink written 4 months ago by Wayne240

but it looks like PatMatch only works for Arabidopsis

ADD REPLYlink written 23 days ago by chahat_u50

@chahat_u PatMatch definitely isn't limited to Arabidopsis. Look at the other post I pointed at here. There are several web sites offering PatMatch working as a web tool for quite a few organisms beyond Arabidopsis. I list the ones I could find here. Additionally, as long as you have the sequence and go to https://github.com/fomightez/patmatch-binder and launch a binder session there, you can follow along with the example I set up and use another genome.

ADD REPLYlink written 9 days ago by Wayne240
2
gravatar for Maximilian Haeussler
3.6 years ago by
UCSC
Maximilian Haeussler1.3k wrote:

Yes, bwa will find it, but you need to change the parameters. Do not use the seeded mode, use the slower -N mode:

bwa aln -n 4 -o 0 -k 4 -N

The sanger CRISPR site uses more or less these parameters.

ADD COMMENTlink written 3.6 years ago by Maximilian Haeussler1.3k

Hi, I tried your method to find the genomic location of a DNA sequence in the hg19 genome, and I ran the following command -

bwa aln -n 4 -o 0 -k 4 -N hg19.fasta testmotif.fq > out.sai

But the out.sai file seemed to only have illegible stuff in it -

SAI  ÄÑÄø ˇˇˇ

Do you have some idea as to what could be going wrong?

ADD REPLYlink written 22 days ago by chahat_u50
0
gravatar for Jeremy Leipzig
5.1 years ago by
Philadelphia, PA
Jeremy Leipzig17k wrote:

vmatch is an excellent general aligner

The Vmatch large scale sequence analysis software

ADD COMMENTlink modified 5.1 years ago by Istvan Albert ♦♦ 77k • written 5.1 years ago by Jeremy Leipzig17k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1402 users visited in the last hour