Issues in running command line blastn with -task blastn-short option
1
0
Entering edit mode
4.2 years ago
Louis Kok ▴ 30

Hi,

I am using BLAST command line tool version 2.6.0.

I have a short sequence database db.fa which consists of 32 sequences with 22 bp. I executed the query data query.fa against db.fa and it shows no hit. However, when I reduce the database to random 26 sequences and created db.partial.fa, it shows hits. I was wondering if there is any size limit for short sequence database to be used?

Please find below the execution:

(1) DB with 32 sequences
This does not produce hit.

makeblastdb -in db.fa -dbtype nucl -parse_seqids

blastn -task blastn-short -query query.fa  -db db.fa -out query.db.blastn

(2) DB with 26 sequences
This produces hits.

makeblastdb -in db.partial.fa -dbtype nucl -parse_seqids

blastn -task blastn-short -query query.fa  -db db.partial.fa -out query.db.partial.blastn

Thanks for your help. I have some test files but there is no option to attach the files (I hope Biostars would allow this in the future).

[Edit]

Since I couldn't attach any file, I will explain what I have used. I used AB011515 from NCBI as query.fa. I generated 32 sequences with exactly same bases of CTTGGTCATTTAGAGGAAGTAA as db.fa, and generated 26 sequences with exactly same bases of above as db.partial.fa, respectively. Hope that explains.

blast ncbi • 996 views
ADD COMMENT
0
Entering edit mode

let me first point out that this a very old version for blast you're using (we're at 2.10.0 for the moment)

Do you get any output when running the 'normal' blastn? (blastn-short actually points to short input query's and not short sequences in the DB)

Can you post the output of the makeblastdb cmd? moreover, for the version 2.2.18 the command to use to format a blastdb was formatdb (and not makeblastdb)

ADD REPLY
0
Entering edit mode

My apologies. It was version 2.6.0 not 2.2.18. I will correct it. It produces no hit using normal blastn as well.

The makeblastdb output is as below:

makeblastdb -in db.fa -dbtype nucl -parse_seqids
Building a new DB, current time: 02/06/2020 08:10:51 
New DB name:   /opt/louis/db.fa
New DB title:  db.fa
Sequence type: Nucleotide
Keep MBits: T
Maximum file size: 1000000000B
Adding sequences from FASTA; added 32 sequences in 0.0299339 seconds.
ADD REPLY
0
Entering edit mode

OK, looks all fine.

Can you post what the hits look like (in the second try you mention)?

Are the 26 sequences a subpart of the 32 or are they different sequences? Could it be there simply are not hits?

ADD REPLY
0
Entering edit mode

The original 32 sequences are different. For simplicity, I created a database of 32 duplicates as db.fa and 26 duplicates for db.partial.fa. Therefore, for db.fa, it looks like this (As you can see, they are all the same):

>DB_01
CTTGGTCATTTAGAGGAAGTAA
>DB_02
CTTGGTCATTTAGAGGAAGTAA
>DB_03
CTTGGTCATTTAGAGGAAGTAA
>DB_04
CTTGGTCATTTAGAGGAAGTAA
>DB_05
CTTGGTCATTTAGAGGAAGTAA
.
.
.
>DB_32
CTTGGTCATTTAGAGGAAGTAA

and db.partial.fa is from DB_01 to DB_26 of db.fa.

Please find below the blast output: using db.fa

BLASTN 2.6.0+


Reference: Stephen F. Altschul, Thomas L. Madden, Alejandro A.
Schaffer, Jinghui Zhang, Zheng Zhang, Webb Miller, and David J.
Lipman (1997), "Gapped BLAST and PSI-BLAST: a new generation of
protein database search programs", Nucleic Acids Res. 25:3389-3402.



Database: db.fa
           32 sequences; 704 total letters



Query= AB011515

Length=630


***** No hits found *****



Lambda      K        H
    1.37    0.711     1.31 

Gapped
Lambda      K        H
    1.37    0.711     1.31 

Effective search space used: 258336



  Database: db.fa
    Posted date:  Feb 6, 2020  9:25 AM
  Number of letters in database: 704
  Number of sequences in database:  32



Matrix: blastn matrix 1 -3
Gap Penalties: Existence: 5, Extension: 2

using db.partial.fa

BLASTN 2.6.0+


Reference: Stephen F. Altschul, Thomas L. Madden, Alejandro A.
Schaffer, Jinghui Zhang, Zheng Zhang, Webb Miller, and David J.
Lipman (1997), "Gapped BLAST and PSI-BLAST: a new generation of
protein database search programs", Nucleic Acids Res. 25:3389-3402.



Database: db.partial.fa
           26 sequences; 572 total letters



Query= AB011515

Length=630
                                                                      Score     E
Sequences producing significant alignments:                          (Bits)  Value

DB_26                                                                 14.4    9.9  
DB_25                                                                 14.4    9.9  
DB_24                                                                 14.4    9.9  
DB_23                                                                 14.4    9.9  
DB_22                                                                 14.4    9.9  
DB_21                                                                 14.4    9.9  
DB_20                                                                 14.4    9.9  
DB_19                                                                 14.4    9.9  
DB_18                                                                 14.4    9.9  
DB_17                                                                 14.4    9.9  
DB_16                                                                 14.4    9.9  
DB_15                                                                 14.4    9.9  
.
.
.
ADD REPLY
0
Entering edit mode

Can I, on the side, ask what the goal of all this is? So far it is making little (biological) sense to me :/

ADD REPLY
0
Entering edit mode

@lieven.sterck. Nothing much special. I was trying to match sequences with different composition of primers and realised that I could not do it with more primer sequences and had not clue why.

ADD REPLY
0
Entering edit mode
4.2 years ago

ok,

seeing this I would think it's due to the score metrics blast calculates.

As a test can you increase the default e-value threshold (10) for the first blast run (with the 32 seq DB) and see if you get any hits then?

ADD COMMENT
1
Entering edit mode

It works when I tried e-value cutoff of 13. The output is as below:

BLASTN 2.6.0+


Reference: Stephen F. Altschul, Thomas L. Madden, Alejandro A.
Schaffer, Jinghui Zhang, Zheng Zhang, Webb Miller, and David J.
Lipman (1997), "Gapped BLAST and PSI-BLAST: a new generation of
protein database search programs", Nucleic Acids Res. 25:3389-3402.



Database: db.fa
           32 sequences; 704 total letters



Query= AB011515

Length=630
                                                                      Score     E
Sequences producing significant alignments:                          (Bits)  Value

DB_26                                                                 14.4    12   
DB_18                                                                 14.4    12   
DB_10                                                                 14.4    12   
DB_02                                                                 14.4    12   
DB_32                                                                 14.4    12   
DB_31                                                                 14.4    12   
DB_30                                                                 14.4    12   
DB_29                                                                 14.4    12   
DB_28                                                                 14.4    12   
DB_27                                                                 14.4    12   
DB_25                                                                 14.4    12   
DB_24                                                                 14.4    12

I will adjust e-value in this case. Thanks a lot for your help!

ADD REPLY
0
Entering edit mode

voila, there you have it.

with the default of 10 you did not got any hits as they scored even worse than 10 (which is already an absurd threshold). On the reasons why (technically), this would likely take us to far but in a nutshell it has to do with the 'effective' DB size' that changed between your 32 and 26 DB version.

ADD REPLY

Login before adding your answer.

Traffic: 2058 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6