I would like to find the places in the genome were my Crispr guide RNA might cause off-target cleavage.
example target: AGGGGTCCTTTCTGAAGTCCAGG
It usually bind to sequences that are identical with the 13 bp upstream of the PAM = NNNNNNNCTTTCTGAAGTCCNGG
so i would like to do a search with blast+ (in UNIX) for CTTTCTGAAGTCCNGG in my genome
My fasta file, which I blast
The code for blasting the sequence against my genome
~/software/ncbi-blast-2.2.30+/bin/blastn -db ~/Cgenomes/Crispr/libs/Cgriseus -query candidate -outfmt 6 -task blastn-short -out temp2 -evalue 100000 -word_size 11
the best results
AGGGGTCCTTTCTGAAGTCCAGG NW_006886065.1 100.00 13 0 0 1 13 53760 53748 59 26.3
AGGGGTCCTTTCTGAAGTCCAGG NW_006886065.1 100.00 13 0 0 1 13 77566 77554 59 26.3
AGGGGTCCTTTCTGAAGTCCAGG NW_006886135.1 100.00 13 0 0 1 13 478726 478714 59 26.3
So I only get results that are up to 13bp pf alignment = nothing to the right of the degenerate N is used for blast.
From NCBI (http://www.ncbi.nlm.nih.gov/blast/Why.shtml)
"Although this alphabet [the degenerate nucleotides] is accepted by BLAST, the BLAST program treats such ambiguities as mismatches in alignment. In short queries, such as primer sequences, these ambiguous bases may prevent BLAST from finding any matches in the database that are as large as the word size."
So should I not be able to get results which are 16bp with 1 mismatch? which parameter would it make sense to tinker with in order for blast+ to find which 16bp combinations are in the genome, which has 1,m2 or 3 mismatches conpared to CTTTCTGAAGTCCNGG