Question: blast+ with a degenerate nucleotide
0
gravatar for kaaschr
4.4 years ago by
kaaschr0
Denmark
kaaschr0 wrote:

I would like to find the places in the genome were my Crispr guide RNA might cause off-target cleavage.

example target: AGGGGTCCTTTCTGAAGTCCAGG
It usually bind to sequences that are identical with the 13 bp upstream of the PAM = NNNNNNNCTTTCTGAAGTCCNGG

so i would like to do a search with blast+ (in UNIX) for CTTTCTGAAGTCCNGG in my genome

My fasta file, which I blast

>AGGGGTCCTTTCTGAAGTCCAGG
CTTTCTGAAGTCCNGG 

The code for blasting the sequence against my genome

~/software/ncbi-blast-2.2.30+/bin/blastn  -db ~/Cgenomes/Crispr/libs/Cgriseus -query candidate -outfmt 6 -task blastn-short -out temp2 -evalue 100000 -word_size 11

the best results

AGGGGTCCTTTCTGAAGTCCAGG NW_006886065.1  100.00  13      0       0       1       13      53760   53748      59   26.3
AGGGGTCCTTTCTGAAGTCCAGG NW_006886065.1  100.00  13      0       0       1       13      77566   77554      59   26.3
AGGGGTCCTTTCTGAAGTCCAGG NW_006886135.1  100.00  13      0       0       1       13      478726  478714     59   26.3

So I only get results that are up to 13bp pf alignment = nothing to the right of the degenerate N is used for blast.

From NCBI (
http://www.ncbi.nlm.nih.gov/blast/Why.shtml)
"Although this alphabet [the degenerate nucleotides] is accepted by BLAST, the BLAST program treats such ambiguities as mismatches in alignment. In short queries, such as primer sequences, these ambiguous bases may prevent BLAST from finding any matches in the database that are as large as the word size."

So should I not be able to get results which are 16bp with 1 mismatch? which parameter would it make sense to tinker with in order for blast+ to find which 16bp combinations are in the genome, which has 1,m2 or 3 mismatches conpared to CTTTCTGAAGTCCNGG 

 

blast alignment • 2.8k views
ADD COMMENTlink modified 4.0 years ago by Biostar ♦♦ 20 • written 4.4 years ago by kaaschr0

the search doesn't require a gapped alignment isn't it ? you just need a string-comparaison with a degenerate alphabet ?
 

ADD REPLYlink written 4.4 years ago by Pierre Lindenbaum121k

My goal is to find in the 2.5Gbp genome where CTTTCTGAAGTCCNGG can align with 0,1 or 2 mismatches.

But would you not say that blastn is the most user-friendly string-comparaison tool out there?

Just to clarify: I am just pretty new to bioinformatics, so the solution is probably very straight forward.  

 

ADD REPLYlink written 4.4 years ago by kaaschr0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1577 users visited in the last hour