I would like to find the places in the genome were my Crispr guide RNA might cause off-target cleavage.
example target: AGGGGTCCTTTCTGAAGTCCAGG
It usually bind to sequences that are identical with the 13 bp upstream of the PAM = NNNNNNNCTTTCTGAAGTCCNGG
So I would like to do a search with blast+ (in UNIX) for CTTTCTGAAGTCCNGG
in my genome
My fasta file, which I blast
>AGGGGTCCTTTCTGAAGTCCAGG
CTTTCTGAAGTCCNGG
The code for blasting the sequence against my genome
~/software/ncbi-blast-2.2.30+/bin/blastn -db ~/Cgenomes/Crispr/libs/Cgriseus -query candidate -outfmt 6 -task blastn-short -out temp2 -evalue 100000 -word_size 11
the best results
AGGGGTCCTTTCTGAAGTCCAGG NW_006886065.1 100.00 13 0 0 1 13 53760 53748 59 26.3
AGGGGTCCTTTCTGAAGTCCAGG NW_006886065.1 100.00 13 0 0 1 13 77566 77554 59 26.3
AGGGGTCCTTTCTGAAGTCCAGG NW_006886135.1 100.00 13 0 0 1 13 478726 478714 59 26.3
So I only get results that are up to 13bp pf alignment = nothing to the right of the degenerate N is used for blast.
From NCBI (http://www.ncbi.nlm.nih.gov/blast/Why.shtml)
Although this alphabet [the degenerate nucleotides] is accepted by BLAST, the BLAST program treats such ambiguities as mismatches in alignment. In short queries, such as primer sequences, these ambiguous bases may prevent BLAST from finding any matches in the database that are as large as the word size.
So should I not be able to get results which are 16bp with 1 mismatch? which parameter would it make sense to tinker with in order for blast+ to find which 16bp combinations are in the genome, which has 1,m2 or 3 mismatches compared to CTTTCTGAAGTCCNGG
The search doesn't require a gapped alignment isn't it? You just need a string-comparaison with a degenerate alphabet?
My goal is to find in the 2.5Gbp genome where CTTTCTGAAGTCCNGG can align with 0,1 or 2 mismatches.
But would you not say that blastn is the most user-friendly string-comparison tool out there?
Just to clarify: I am just pretty new to bioinformatics, so the solution is probably very straight forward.
Hey, I have the same question. Were you able to solve this problem?