Question: Blast Settings For Short Sequences
5
gravatar for kevin.l.neff
7.5 years ago by
kevin.l.neff240
Mayo Clinic College of Medicine
kevin.l.neff240 wrote:

I'm searching for short sequences in nt. By short, I mean 10-20 bases. When I run blastn, I get no results, regardless of my -evalue settings. Here's the test sequence:

>ponzr
CGCGGTAAAACACATTTG

And I run BLAST as follows:

./blastn -db nt -remote -query test2.seq -task "megablast" -out test2.out

With the default evalue settings, I should get a lot of hits, but I get none. I've verified this on the NCBI web service but it informs me that adjustments have been made to the parameters to accommodate my short search. I want to make these adjustment from the command line, but I'm not sure where to start.

Here's my output:

BLASTN 2.2.26+


Reference: Zheng Zhang, Scott Schwartz, Lukas Wagner, and Webb
Miller (2000), "A greedy algorithm for aligning DNA sequences", J
Comput Biol 2000; 7(1-2):203-14.



Database: All GenBank+EMBL+DDBJ+PDB sequences (but no EST, STS,
GSS,environmental samples or phase 0, 1 or 2 HTGS sequences)
           16,169,102 sequences; 41,381,280,968 total letters



Query= ponzr1

Length=18

RID: XU4WV327016


***** No hits found *****



Lambda     K      H
    1.33    0.621     1.12

Gapped
Lambda     K      H
    1.28    0.460    0.850

Effective search space used: 123416233314


  Database: All GenBank+EMBL+DDBJ+PDB sequences (but no EST, STS,
GSS,environmental samples or phase 0, 1 or 2 HTGS sequences)
    Posted date:  Jun 15, 2012 10:12 AM
  Number of letters in database: 41,381,280,968
  Number of sequences in database:  16,169,102



Matrix: blastn matrix 1 -2
Gap Penalties: Existence: 0, Extension: 2.5
blast • 20k views
ADD COMMENTlink modified 11 months ago by friist10 • written 7.5 years ago by kevin.l.neff240
15
gravatar for kevin.l.neff
7.5 years ago by
kevin.l.neff240
Mayo Clinic College of Medicine
kevin.l.neff240 wrote:

oh, the option -task "blastn-short" seems to do the trick.

Only works for queries equal to or longer than 17 bases.... I wish this stuff was documented somewhere. This task is mentioned in the command-line manual, but nothing about its settings.

ADD COMMENTlink written 7.5 years ago by kevin.l.neff240
2
gravatar for kevin.l.neff
7.5 years ago by
kevin.l.neff240
Mayo Clinic College of Medicine
kevin.l.neff240 wrote:

http://www.ncbi.nlm.nih.gov/BLAST/Why.shtml

You can adjust both the word size and the expect value on the standard BLAST pages to work with short sequences. However, we do provide a BLAST page with these values preset to give optimum results with short sequences. This page ("Search for short and nearly exact matches") is linked under the nucleotide BLAST section of the main BLAST page. The adjustments are described in the table below.

Program                             Word Size   Filter Setting  Expect Value
------------------------------------------------------------------------------
Standard Nucleotide BLAST              11         On (DUST)           10
Search for short/near exact matches     7         Off               1000
ADD COMMENTlink written 7.5 years ago by kevin.l.neff240
0
gravatar for friist
11 months ago by
friist10
Bodø, Norway
friist10 wrote:

I got the "blastn-short" working on as little as 10 bases on BLASTN 2.8.1 , but that was the limit.

$ echo ">short_query" > short_query.fa && echo "CCATATCACC" >> short_query.fa

Blasted this query against a database of 249 ebola genomes:

 $ blastn -task blastn-short -db all_ebola_genomes -query short_query.fa

Partial output:

> KM233035.1 Zaire ebolavirus isolate Ebola virus/H.sapiens-wt/SLE/2014/Makona-EM104, 
complete genome
Length=18912

 Score = 20.3 bits (10),  Expect = 7.2
 Identities = 10/10 (100%), Gaps = 0/10 (0%)
 Strand=Plus/Plus

Query  1    CCATATCACC  10
            ||||||||||
Sbjct  102  CCATATCACC  111


> KM233036.1 Zaire ebolavirus isolate Ebola virus/H.sapiens-wt/SLE/2014/Makona-EM106, 
complete genome
Length=18873

 Score = 20.3 bits (10),  Expect = 7.2
 Identities = 10/10 (100%), Gaps = 0/10 (0%)
 Strand=Plus/Plus

Query  1    CCATATCACC  10
            ||||||||||
Sbjct  98   CCATATCACC  107
ADD COMMENTlink written 11 months ago by friist10
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1952 users visited in the last hour