How To Build The Correct Blast Url For The Rest Interface
1
0
Entering edit mode
10.5 years ago

Hi All,

I am having difficulties constructing a BLAST query via the REST interface.

In particular I would like to have a defline via QUERY_BELIEVE_DEFLINE and actually submit two sequences.

The URL I construct is this (BTW, apparently the URL is to long for the forum): http://www.ncbi.nlm.nih.gov/blast/Blast.cgi?QUERY_BELIEVE_DEFLINE=yes&QUERY=%3E%20PMID%2019846642%20Forward%0D%0AATGTTCGATTCCGGCTTCC%0D%0%3E%20PMID%2019846642%20Reverse%0D%0CGGGCTTCGGTGTACCTCAT&PROGRAM=blastn&DATABASE=nr&ENTREZ_QUERY=gyra%20[gene]%20AND%20AL123456.3%20AND%20txid83332%20[ORGN]&CMD=PUT

When I look at the query in the BLAST website interface, the Query Sequence looks like this:

>19846642 Forward
ATGTTCGATTCCGGCTTCCMDRVRSGGGCTTCGGTGTACCTCAT

What I really want is my Query to look like this:

>PMID 19846642 Forward
ATGTTCGATTCCGGCTTCC
>PMID 19846642 Reverse
GGGCTTCGGTGTACCTCAT

So it looks like the hex conversion somehow kinda worked but also kinda did not work.

What am I doing wrong in my query construction ?

blast • 3.0k views
ADD COMMENT
1
Entering edit mode
10.5 years ago

my URL is

$ curl -s "http://www.ncbi.nlm.nih.gov/blast/Blast.cgi?CMD=PUT&PROGRAM=blastn&DATABASE=nr&QUERY_BELIEVE_DEFLINE=false&ENTREZ_QUERY=gyra%20%5Bgene%5D%20AND%20AL123456.3%20AND%20txid83332%20%5BORGN%5D&QUERY=%3EPMID%2019846642%20Forward%0AATGTTCGATTCCGGCTTCC%0A%3EPMID%2019846642%20Reverse%0AGGGCTTCGGTGTACCTCAT%0A"

with QUERY_BELIEVE_DEFLINE=false : there is no gi on your fasta line:

it returns

(...)
<!--QBlastInfoBegin
    RID = 59MF46VT01R
    RTOE = 20
QBlastInfoEnd
-->
(...)

I fetched the result with:

curl -s "http://www.ncbi.nlm.nih.gov/blast/Blast.cgi?CMD=Get&RID=59MF46VT01R&FORMAT_TYPE=Text"


BLASTN 2.2.28+
Reference: Stephen F. Altschul, Thomas L. Madden, Alejandro
A. Schaffer, Jinghui Zhang, Zheng Zhang, Webb Miller, and
David J. Lipman (1997), "Gapped BLAST and PSI-BLAST: a new
generation of protein database search programs", Nucleic
Acids Res. 25:3389-3402.


RID: 59MF46VT01R


Database: Nucleotide collection (nt)
           20,069,287 sequences; 50,671,056,400 total letters
Query= PMID 19846642 Forward

Length=19
                                                                   Score     E
Sequences producing significant alignments:                       (Bits)  Value

emb|AL123456.3|  Mycobacterium tuberculosis H37Rv complete genome  35.6    3e-04

ALIGNMENTS
>emb|AL123456.3| Mycobacterium tuberculosis H37Rv complete genome
Length=4411532

 Features in this part of subject sequence:
   DNA gyrase (subunit A) GyrA (DNA topoisomerase (ATP-hydro...

 Score = 35.6 bits (38),  Expect = 3e-04
 Identities = 19/19 (100%), Gaps = 0/19 (0%)
 Strand=Plus/Plus

Query  1     ATGTTCGATTCCGGCTTCC  19
             |||||||||||||||||||
Sbjct  7476  ATGTTCGATTCCGGCTTCC  7494


 Features in this part of subject sequence:
   Probable medium chain fatty-acid-CoA ligase FadD14 (fatty...

 Score = 26.5 bits (28),  Expect = 0.14
 Identities = 14/14 (100%), Gaps = 0/14 (0%)
 Strand=Plus/Plus

Query  3        GTTCGATTCCGGCT  16
                ||||||||||||||
Sbjct  1181898  GTTCGATTCCGGCT  1181911


 Features in this part of subject sequence:
   Probable arsenic-transport integral membrane protein ArsB1

 Score = 26.5 bits (28),  Expect = 0.14
 Identities = 14/14 (100%), Gaps = 0/14 (0%)
 Strand=Plus/Minus

Query  5        TCGATTCCGGCTTC  18
                ||||||||||||||
Sbjct  3002143  TCGATTCCGGCTTC  3002130


 Features in this part of subject sequence:
   Possible lipid transfer protein or keto acyl-CoA thiolase...

 Score = 26.5 bits (28),  Expect = 0.14
 Identities = 14/14 (100%), Gaps = 0/14 (0%)
 Strand=Plus/Minus

Query  1        ATGTTCGATTCCGG  14
                ||||||||||||||
Sbjct  3959155  ATGTTCGATTCCGG  3959142


 Features in this part of subject sequence:
   Glutamine synthetase GlnA1 (glutamine synthase) (GS-I)

 Score = 24.7 bits (26),  Expect = 0.48
 Identities = 13/13 (100%), Gaps = 0/13 (0%)
 Strand=Plus/Plus

Query  6        CGATTCCGGCTTC  18
                |||||||||||||
Sbjct  2487721  CGATTCCGGCTTC  2487733


 Features in this part of subject sequence:
   Probable sugar-transport integral membrane protein ABC tr...

 Score = 24.7 bits (26),  Expect = 0.48
 Identities = 13/13 (100%), Gaps = 0/13 (0%)
 Strand=Plus/Plus

Query  6        CGATTCCGGCTTC  18
                |||||||||||||
Sbjct  2589020  CGATTCCGGCTTC  2589032


 Features in this part of subject sequence:
   Probable epoxide hydrolase EphC (epoxide hydratase)

 Score = 22.9 bits (24),  Expect = 1.7
 Identities = 12/12 (100%), Gaps = 0/12 (0%)
 Strand=Plus/Plus

Query  4        TTCGATTCCGGC  15
                ||||||||||||
Sbjct  1247372  TTCGATTCCGGC  1247383


 Features in this part of subject sequence:
   Probable Fmu protein (sun protein)

 Score = 22.9 bits (24),  Expect = 1.7
 Identities = 12/12 (100%), Gaps = 0/12 (0%)
 Strand=Plus/Minus

(...)

Query= PMID 19846642 Reverse

Length=19


                                                                   Score     E
Sequences producing significant alignments:                       (Bits)  Value

emb|AL123456.3|  Mycobacterium tuberculosis H37Rv complete genome  35.6    3e-04

ALIGNMENTS
>emb|AL123456.3| Mycobacterium tuberculosis H37Rv complete genome
Length=4411532

 Features in this part of subject sequence:
   DNA gyrase (subunit A) GyrA (DNA topoisomerase (ATP-hydro...

 Score = 35.6 bits (38),  Expect = 3e-04
 Identities = 19/19 (100%), Gaps = 0/19 (0%)
 Strand=Plus/Minus

Query  1     GGGCTTCGGTGTACCTCAT  19
             |||||||||||||||||||
Sbjct  7698  GGGCTTCGGTGTACCTCAT  7680


 Features in this part of subject sequence:
   Conserved hypothetical protein

 Score = 26.5 bits (28),  Expect = 0.14
 Identities = 14/14 (100%), Gaps = 0/14 (0%)
 Strand=Plus/Minus

Query  3        GCTTCGGTGTACCT  16
                ||||||||||||||
Sbjct  1029956  GCTTCGGTGTACCT  1029943



Lambda      K        H
   0.634    0.408    0.912 
Gapped
Lambda      K        H
   0.625    0.410    0.780 
Matrix: blastn matrix:2 -3
Gap Penalties: Existence: 5, Extension: 2
Number of Sequences: 1
Number of Hits to DB: 0
Number of extensions: 0
Number of successful extensions: 0
Number of sequences better than 10: 0
Number of HSP's better than 10 without gapping: 0
Number of HSP's gapped: 0
Number of HSP's successfully gapped: 0
Length of database: 4411534
A: 0
X1: 22 (20.1 bits)
X2: 33 (29.8 bits)
X3: 110 (99.2 bits)
S1: 22 (21.1 bits)
ADD COMMENT
0
Entering edit mode

Thanks ! I apparently did not realize what the definition line really is ! I thought it was basically a comment line but it can apparently be used as part of the algorithm.

ADD REPLY

Login before adding your answer.

Traffic: 2929 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6