Question: Blast Settings For Short Sequences
3
gravatar for kevin.l.neff
6.8 years ago by
kevin.l.neff200
Mayo Clinic College of Medicine
kevin.l.neff200 wrote:

I'm searching for short sequences in nt. By short, I mean 10-20 bases. When I run blastn, I get no results, regardless of my -evalue settings. Here's the test sequence:

>ponzr
CGCGGTAAAACACATTTG

And I run BLAST as follows:

./blastn -db nt -remote -query test2.seq -task "megablast" -out test2.out

With the default evalue settings, I should get a lot of hits, but I get none. I've verified this on the NCBI web service but it informs me that adjustments have been made to the parameters to accommodate my short search. I want to make these adjustment from the command line, but I'm not sure where to start.

Here's my output:

BLASTN 2.2.26+


Reference: Zheng Zhang, Scott Schwartz, Lukas Wagner, and Webb
Miller (2000), "A greedy algorithm for aligning DNA sequences", J
Comput Biol 2000; 7(1-2):203-14.



Database: All GenBank+EMBL+DDBJ+PDB sequences (but no EST, STS,
GSS,environmental samples or phase 0, 1 or 2 HTGS sequences)
           16,169,102 sequences; 41,381,280,968 total letters



Query= ponzr1

Length=18

RID: XU4WV327016


***** No hits found *****



Lambda     K      H
    1.33    0.621     1.12

Gapped
Lambda     K      H
    1.28    0.460    0.850

Effective search space used: 123416233314


  Database: All GenBank+EMBL+DDBJ+PDB sequences (but no EST, STS,
GSS,environmental samples or phase 0, 1 or 2 HTGS sequences)
    Posted date:  Jun 15, 2012 10:12 AM
  Number of letters in database: 41,381,280,968
  Number of sequences in database:  16,169,102



Matrix: blastn matrix 1 -2
Gap Penalties: Existence: 0, Extension: 2.5
blast • 17k views
ADD COMMENTlink modified 11 weeks ago by TEF0 • written 6.8 years ago by kevin.l.neff200
13
gravatar for kevin.l.neff
6.8 years ago by
kevin.l.neff200
Mayo Clinic College of Medicine
kevin.l.neff200 wrote:

oh, the option -task "blastn-short" seems to do the trick.

Only works for queries equal to or longer than 17 bases.... I wish this stuff was documented somewhere. This task is mentioned in the command-line manual, but nothing about its settings.

ADD COMMENTlink written 6.8 years ago by kevin.l.neff200
2
gravatar for kevin.l.neff
6.8 years ago by
kevin.l.neff200
Mayo Clinic College of Medicine
kevin.l.neff200 wrote:

http://www.ncbi.nlm.nih.gov/BLAST/Why.shtml

You can adjust both the word size and the expect value on the standard BLAST pages to work with short sequences. However, we do provide a BLAST page with these values preset to give optimum results with short sequences. This page ("Search for short and nearly exact matches") is linked under the nucleotide BLAST section of the main BLAST page. The adjustments are described in the table below.

Program                             Word Size   Filter Setting  Expect Value
------------------------------------------------------------------------------
Standard Nucleotide BLAST              11         On (DUST)           10
Search for short/near exact matches     7         Off               1000
ADD COMMENTlink written 6.8 years ago by kevin.l.neff200
0
gravatar for TEF
11 weeks ago by
TEF0
Bodø, Norway
TEF0 wrote:

I got the "blastn-short" working on as little as 10 bases on BLASTN 2.8.1 , but that was the limit.

$ echo ">short_query" > short_query.fa && echo "CCATATCACC" >> short_query.fa

Blasted this query against a database of 249 ebola genomes:

 $ blastn -task blastn-short -db all_ebola_genomes -query short_query.fa

Partial output:

> KM233035.1 Zaire ebolavirus isolate Ebola virus/H.sapiens-wt/SLE/2014/Makona-EM104, 
complete genome
Length=18912

 Score = 20.3 bits (10),  Expect = 7.2
 Identities = 10/10 (100%), Gaps = 0/10 (0%)
 Strand=Plus/Plus

Query  1    CCATATCACC  10
            ||||||||||
Sbjct  102  CCATATCACC  111


> KM233036.1 Zaire ebolavirus isolate Ebola virus/H.sapiens-wt/SLE/2014/Makona-EM106, 
complete genome
Length=18873

 Score = 20.3 bits (10),  Expect = 7.2
 Identities = 10/10 (100%), Gaps = 0/10 (0%)
 Strand=Plus/Plus

Query  1    CCATATCACC  10
            ||||||||||
Sbjct  98   CCATATCACC  107
ADD COMMENTlink written 11 weeks ago by TEF0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1121 users visited in the last hour