blast short sequence with multiple N bases
0
0
Entering edit mode
3 months ago
Nicolas • 0

Hello!

I am trying to find a promoter inside a genome ensamble, I have the promoter sequence

GGTTGTNNNNNNNNNACAACC

Whenever I try to find this sequences with the following code:

blastn -query=Prv211.fasta -subject=mtbgenome.fna -task=blastn-short -out=output.txt -evalue=10

, I have no matchs:

Database: User specified sequence set (Input: mtbgenome.fna).
       1 sequences; 4,411,532 total letters



Query= 
Length=21


***** No hits found *****



Lambda      K        H
    1.37    0.711     1.31 

Gapped
Lambda      K        H
    1.37    0.711     1.31 

Effective search space used: 35292152


  Database: User specified sequence set (Input: mtbgenome.fna).
    Posted date:  Unknown
  Number of letters in database: 4,411,532
  Number of sequences in database:  1



Matrix: blastn matrix 1 -3
Gap Penalties: Existence: 5, Extension: 2

On the other hand, when I use the actual sequence of the promoter (with the actual bases instead of the Ns, it does find the promoter without problems)

Anyone knows why is this happening?

Thanks!

Edit = I increased evalue to 10, still no results

promoter blastn short • 189 views
ADD COMMENT
0
Entering edit mode

my feeling is that for blast to work properly you need to have a sufficient length of matching sequences. in your case perhaps try playing with the seed lengths and other parameters like that.

In general if you are looking for a pattern, you could try another tool that matches patterns rather than performs alignments.

https://bioinf.shenwei.me/seqkit/usage/#locate

ADD REPLY

Login before adding your answer.

Traffic: 1085 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6