Question: [solved] Retrieve fasta from balst db using blastdbcmd: Error: gi|742519789: OID not found
0
gravatar for dago
4.0 years ago by
dago2.5k
Germany
dago2.5k wrote:

I saw that many people had problem retrieving sequences from blast db.

I could find a way around it, so maybe someone has a god link/suggestion/reference I could use.

I want to extract sequences from nr db.

I have a list of identifier, obtained from a  previous blast search

gi|740719731|ref|WP_038505017.1|
gi|740813732|ref|WP_038599015.1|
gi|740864652|ref|WP_038649903.1|
gi|740899195|ref|WP_038684443.1|
gi|740906294|ref|WP_038691542.1|

 

Now I try to query only

GIs:

740864652
740899195
740906294

 

or ref:

WP_038649903.1
WP_038684443.1
WP_038691542.1

 

blastdbcmd -db ~/Documents/nr_blastdb/nr -entry_batch Ids

 

But I get always:

Error: XXXXX: OID not found

 

What am I missing here?

 

 

blast software error • 4.8k views
ADD COMMENTlink modified 4.0 years ago • written 4.0 years ago by dago2.5k
2
gravatar for 5heikki
4.0 years ago by
5heikki8.4k
Finland
5heikki8.4k wrote:

Did you make the nr db yourself from the fasta file or download the pre-formatted db files? If former, did you apply the -parse_seqids flag? If not, there's your problem.

ADD COMMENTlink modified 4.0 years ago • written 4.0 years ago by 5heikki8.4k

Not quite sure. I got it from a colleague. Could it be a problem related to the index of the entries?

 

ADD REPLYlink written 4.0 years ago by dago2.5k
1

cat ~/Documents/nr_blastdb/nr.pal

shows?

If it's a pre-formatted db, the title line is something like:

TITLE All non-redundant GenBank CDS translations+PDB+SwissProt+PIR+PRF excluding environmental samples from WGS projects

 

ADD REPLYlink modified 4.0 years ago • written 4.0 years ago by 5heikki8.4k

Perfect, was a manually created db. I will try to follow your first suggestion

ADD REPLYlink written 4.0 years ago by dago2.5k

You are right, it worked fine now. However, for some seq now there are really creazu Ids, for example

 

> XXXXXX >XXXXX

CGSNDHIEJWPSP

 

It looks like two Ids one after the other. Any idea where the problem is?

ADD REPLYlink written 4.0 years ago by dago2.5k
1

It's a non-redundant database and those two accessions encode an identical protein. You can avoid this behavior with the -target_only flag..

ADD REPLYlink written 4.0 years ago by 5heikki8.4k

I learned many things today! Thanks very much!!!!

ADD REPLYlink written 4.0 years ago by dago2.5k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1539 users visited in the last hour