blast short sequence with multiple N bases
0
0
Entering edit mode
3 months ago
Nicolas • 0

Hello!

I am trying to find a promoter inside a genome ensamble, I have the promoter sequence

GGTTGTNNNNNNNNNACAACC

Whenever I try to find this sequences with the following code:

blastn -query=Prv211.fasta -subject=mtbgenome.fna -task=blastn-short -out=output.txt -evalue=10


, I have no matchs:

Database: User specified sequence set (Input: mtbgenome.fna).
1 sequences; 4,411,532 total letters

Query=
Length=21

***** No hits found *****

Lambda      K        H
1.37    0.711     1.31

Gapped
Lambda      K        H
1.37    0.711     1.31

Effective search space used: 35292152

Database: User specified sequence set (Input: mtbgenome.fna).
Posted date:  Unknown
Number of letters in database: 4,411,532
Number of sequences in database:  1

Matrix: blastn matrix 1 -3
Gap Penalties: Existence: 5, Extension: 2


On the other hand, when I use the actual sequence of the promoter (with the actual bases instead of the Ns, it does find the promoter without problems)

Anyone knows why is this happening?

Thanks!

promoter blastn short • 189 views
0
Entering edit mode

my feeling is that for blast to work properly you need to have a sufficient length of matching sequences. in your case perhaps try playing with the seed lengths and other parameters like that.

In general if you are looking for a pattern, you could try another tool that matches patterns rather than performs alignments.

https://bioinf.shenwei.me/seqkit/usage/#locate