Blast Settings For Short Sequences
3
6
Entering edit mode
9.6 years ago
kevin.l.neff ▴ 310

I'm searching for short sequences in nt. By short, I mean 10-20 bases. When I run blastn, I get no results, regardless of my -evalue settings. Here's the test sequence:

>ponzr
CGCGGTAAAACACATTTG

And I run BLAST as follows:

./blastn -db nt -remote -query test2.seq -task "megablast" -out test2.out

With the default evalue settings, I should get a lot of hits, but I get none. I've verified this on the NCBI web service but it informs me that adjustments have been made to the parameters to accommodate my short search. I want to make these adjustment from the command line, but I'm not sure where to start.

Here's my output:

BLASTN 2.2.26+


Reference: Zheng Zhang, Scott Schwartz, Lukas Wagner, and Webb
Miller (2000), "A greedy algorithm for aligning DNA sequences", J
Comput Biol 2000; 7(1-2):203-14.



Database: All GenBank+EMBL+DDBJ+PDB sequences (but no EST, STS,
GSS,environmental samples or phase 0, 1 or 2 HTGS sequences)
           16,169,102 sequences; 41,381,280,968 total letters



Query= ponzr1

Length=18

RID: XU4WV327016


***** No hits found *****



Lambda     K      H
    1.33    0.621     1.12

Gapped
Lambda     K      H
    1.28    0.460    0.850

Effective search space used: 123416233314


  Database: All GenBank+EMBL+DDBJ+PDB sequences (but no EST, STS,
GSS,environmental samples or phase 0, 1 or 2 HTGS sequences)
    Posted date:  Jun 15, 2012 10:12 AM
  Number of letters in database: 41,381,280,968
  Number of sequences in database:  16,169,102



Matrix: blastn matrix 1 -2
Gap Penalties: Existence: 0, Extension: 2.5
blast • 29k views
ADD COMMENT
20
Entering edit mode
9.6 years ago
kevin.l.neff ▴ 310

oh, the option -task "blastn-short" seems to do the trick.

Only works for queries equal to or longer than 17 bases.... I wish this stuff was documented somewhere. This task is mentioned in the command-line manual, but nothing about its settings.

ADD COMMENT
0
Entering edit mode

Kevin, thank you so much for this answer. This option worked and lifted me from a pit of despair =)

ADD REPLY
2
Entering edit mode
9.6 years ago
kevin.l.neff ▴ 310

http://www.ncbi.nlm.nih.gov/BLAST/Why.shtml

You can adjust both the word size and the expect value on the standard BLAST pages to work with short sequences. However, we do provide a BLAST page with these values preset to give optimum results with short sequences. This page ("Search for short and nearly exact matches") is linked under the nucleotide BLAST section of the main BLAST page. The adjustments are described in the table below.

Program                             Word Size   Filter Setting  Expect Value
------------------------------------------------------------------------------
Standard Nucleotide BLAST              11         On (DUST)           10
Search for short/near exact matches     7         Off               1000
ADD COMMENT
1
Entering edit mode
3.0 years ago
friist ▴ 20

I got the "blastn-short" working on as little as 10 bases on BLASTN 2.8.1 , but that was the limit.

$ echo ">short_query" > short_query.fa && echo "CCATATCACC" >> short_query.fa

Blasted this query against a database of 249 ebola genomes:

 $ blastn -task blastn-short -db all_ebola_genomes -query short_query.fa

Partial output:

> KM233035.1 Zaire ebolavirus isolate Ebola virus/H.sapiens-wt/SLE/2014/Makona-EM104, 
complete genome
Length=18912

 Score = 20.3 bits (10),  Expect = 7.2
 Identities = 10/10 (100%), Gaps = 0/10 (0%)
 Strand=Plus/Plus

Query  1    CCATATCACC  10
            ||||||||||
Sbjct  102  CCATATCACC  111


> KM233036.1 Zaire ebolavirus isolate Ebola virus/H.sapiens-wt/SLE/2014/Makona-EM106, 
complete genome
Length=18873

 Score = 20.3 bits (10),  Expect = 7.2
 Identities = 10/10 (100%), Gaps = 0/10 (0%)
 Strand=Plus/Plus

Query  1    CCATATCACC  10
            ||||||||||
Sbjct  98   CCATATCACC  107
ADD COMMENT

Login before adding your answer.

Traffic: 1652 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6