Blastdbcmd command is failing to give sequences.
1
0
Entering edit mode
8.9 years ago
Prasad ▴ 50

Hello all,

I am trying to retrieve sequences from NCBI nr database. Using following command.

blastdbcmd -entry 'all' -db Path/to/db -outfmt '%f' -out output.fasta

It starts to give out fasta file but command fails after sometime. And the error message I get is

Error: CSeqDBAtlas::MapMmap: While mapping file [/mnt/LV1/blast_db/nr_nt/nr.07.psq] with 12898483378 bytes allocated, caught exception:
NCBI C++ Exception:
    "/build/buildd/ncbi-blast+-2.2.28/c++/src/objtools/blast/seqdb_reader/seqdbatlas.cpp", line 152: Error: ncbi::SeqDB_ThrowException() - Validation failed: [end <= file_size] at /build/buildd/ncbi-blast+-2.2.28/c++/src/objtools/blast/seqdb_reader/seqdbatlas.cpp:506

Has anyone faced this problem? If so, then how to fix this error.

Any help is appreciated.

Thanks

blast • 3.9k views
ADD COMMENT
0
Entering edit mode
8.9 years ago
pld 5.1k

Seems like the database you have might be corrupted, maybe one of the files is incomplete (didn't download completely).

Any reason why you need every single sequence from NR in fasta? That is a massive file.You can just download it manually from FTP:

ftp://ftp.ncbi.nlm.nih.gov/blast/db/FASTA/

ADD COMMENT
0
Entering edit mode

I am trying to compare 2 NR database versions. I want to get gene identifiers ids from both the database so that I can blast sequence only to new gene identifier(GI) ids. For latest nr database I could download the fasta file and grep all GI ids from it. But for older nr database we don't have fasta file so first I was trying to get just GI from the nr database using

blastdbcmd -entry 'all' -db Path/to/db -outfmt '%g' -out gi_id_list.txt

It is taking forever to complete that process as far as my calculations it will take around 27 days to get the gi_id_list.txt.

That's the reason why I was trying to get every single sequence from NR in fasta so that I can grep out the GI id from it quickly.

Is there anyway quicker way to get all the GI's from older NR database.

ADD REPLY
0
Entering edit mode

Look Is there any BLAST database archive? for an alternative approach.

ADD REPLY
0
Entering edit mode

Sounds good to me. Will give a try to this approach.

ADD REPLY
0
Entering edit mode

If you read the documentation for blastdbcmd, it will lay out some of the available output options. I know it is possible to only collect ids/accessions/etc.

ADD REPLY
0
Entering edit mode

Yeah I blastdbcmd does give just the ids. I did try

blastdbcmd -entry 'all' -db Path/to/db -outfmt '%g' -out gi_id_list.txt

which just gives out the ids but as I said in my earlier comment it will take forever to complete it.

I am looking for faster way to get GI id list from NR database. As of now I only see the quickest to get GI is from Fasta file.

ADD REPLY
0
Entering edit mode

I think either way will take a really long time unless you can fit the whole file into memory. Getting the GI from the fasta file would require you parsing each fasta definition line, which might slow you down.

ADD REPLY

Login before adding your answer.

Traffic: 2865 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6