wrong version of formatdb was used to make database
1
0
Entering edit mode
4.3 years ago

I am using uniref100 database for running the psipred script (runpsipred) on macOS. On running the runpsipred script :

#!/bin/tcsh

# This is a simple script which will carry out all of the basic steps
# required to make a PSIPRED prediction. Note that it assumes that the
# following programs are in the appropriate directories:
# blastpgp - PSIBLAST executable (from NCBI toolkit)
# makemat - IMPALA utility (from NCBI toolkit)
# psipred - PSIPRED V4 program
# psipass2 - PSIPRED V4 program

# NOTE: Script modified to be more cluster friendly (DTJ April 2008)

# The name of the BLAST data bank
set dbname = /Users/shalini/Desktop/pgen_github/uniref100/uniref100.fasta

# Where the NCBI programs have been installed
set ncbidir = /Users/shalini/Desktop/pgen_github/ncbi-blast-2.10.0+/bin

# Where the PSIPRED V4 programs have been installed
set execdir = /Users/shalini/Desktop/pgen_github/pGenTHREADER-master/psipred/bin

# Where the PSIPRED V4 data files have been installed
set datadir = /Users/shalini/Desktop/pgen_github/pGenTHREADER-master/psipred/data

set basename = $1:r
set rootname = $basename:t

# Generate a "unique" temporary filename root
set hostid = `hostid`
set tmproot = psitmp$$$hostid

\cp -f $1 $tmproot.fasta

echo "Running PSI-BLAST with sequence" $1 "..."

$ncbidir/blastpgp -b 0 -v 5000 -j 3 -h 0.001 -d $dbname -i $tmproot.fasta -C $tmproot.chk >& $tmproot.blast

if ($status != 0) then
    tail $tmproot.blast
    echo "FATAL: Error whilst running blastpgp - script terminated!"
    exit $status
endif

echo "Predicting secondary structure..."

echo $tmproot.chk > $tmproot.pn
echo $tmproot.fasta > $tmproot.sn

$ncbidir/makemat -P $tmproot

if ($status != 0) then
    echo "FATAL: Error whilst running makemat - script terminated!"
    exit $status
endif

echo Pass1 ...

$execdir/psipred $tmproot.mtx $datadir/weights.dat $datadir/weights.dat2 $datadir/weights.dat3 > $rootname.ss

if ($status != 0) then
    echo "FATAL: Error whilst running psipred - script terminated!"
    exit $status
endif

echo Pass2 ...

$execdir/psipass2 $datadir/weights_p2.dat 1 1.0 1.0 $rootname.ss2 $rootname.ss > $rootname.horiz

if ($status != 0) then
    echo "FATAL: Error whilst running psipass2 - script terminated!"
    exit $status
endif

# Remove temporary files

echo Cleaning up ...
\rm -f $tmproot.* error.log

echo "Final output files:" $rootname.ss2 $rootname.horiz
echo "Finished."

The error is

Running PSI-BLAST with sequence /Users/shalini/Desktop/pgen_github/unmodelled_fasta/NP_999840.fasta ...
[NULL_Caption] WARNING: readdb: wrong version of formatdb was used to make database //Users/shalini/Desktop/pgen_github/uniref100/uniref100.fasta.65.
[NULL_Caption] WARNING: readdb: wrong version of formatdb was used to make database //Users/shalini/Desktop/pgen_github/uniref100/uniref100.fasta.66.
[NULL_Caption] WARNING: readdb: wrong version of formatdb was used to make database //Users/shalini/Desktop/pgen_github/uniref100/uniref100.fasta.67.
[NULL_Caption] WARNING: readdb: wrong version of formatdb was used to make database //Users/shalini/Desktop/pgen_github/uniref100/uniref100.fasta.68.
[NULL_Caption] WARNING: readdb: wrong version of formatdb was used to make database //Users/shalini/Desktop/pgen_github/uniref100/uniref100.fasta.69.
[NULL_Caption] WARNING: readdb: wrong version of formatdb was used to make database //Users/shalini/Desktop/pgen_github/uniref100/uniref100.fasta.70.
[NULL_Caption] WARNING: readdb: wrong version of formatdb was used to make database //Users/shalini/Desktop/pgen_github/uniref100/uniref100.fasta.71.
[NULL_Caption] WARNING: readdb: wrong version of formatdb was used to make database //Users/shalini/Desktop/pgen_github/uniref100/uniref100.fasta.72.
[NULL_Caption] WARNING: readdb: wrong version of formatdb was used to make database //Users/shalini/Desktop/pgen_github/uniref100/uniref100.fasta.73.
[NULL_Caption] WARNING: Unable to open uniref100.fasta.pin
FATAL: Error whilst running blastpgp - script terminated!

I have made the database using

makeblastdb -in /Users/shalini/Desktop/pgen_github/uniref100/uniref100.fasta -parse_seqids -blastdb_version 5 -title "Unirefdb" -dbtype prot

results in :

Building a new DB, current time: 01/14/2020 15:45:24
New DB name:   /Users/shalini/Desktop/pgen_github/uniref100/uniref100.fasta
New DB title:  Unirefdb
Sequence type: Protein
Keep MBits: T
Maximum file size: 1000000000B
Adding sequences from FASTA; added 199397329 sequences in 7009.84 seconds.

There are total 73 .psq, .pin, .pog and .phr files generated

blastpgp runpsipred formatdb • 2.7k views
ADD COMMENT
0
Entering edit mode

where did you get the blastpgp executable from? ( as far as I know it's not part of the default blast package) and consequently which version are you using?

perhaps try formatting your DB using the older -blastdb_version 4 . In any case when formatting large DB you should also get one .pal (or .nal for nucleotides) files which 'groups' all the subparts.

ADD REPLY
0
Entering edit mode
4.3 years ago
Mensur Dlakic ★ 27k

Most likely you need to format your protein database with formatdb that matches your version of blastpgp. Based on your blast programs location, you format the database like this:

/Users/shalini/Desktop/pgen_github/ncbi-blast-2.10.0+/bin/formatdb -i /Users/shalini/Desktop/pgen_github/uniref100/uniref100.fasta -p T -t "Unirefdb"

You may want to first delete previously formatted files.

ADD COMMENT
1
Entering edit mode

formatdb is been deprecated for years now, so I don't think it will be available from the 2.10.0+ version

ADD REPLY
0
Entering edit mode

Right you are about formatdb being deprecated. On the other hand, PSI-PRED still works best with legacy BLAST versions that have formatdb, so here is the link to last version that should work:

ftp://ftp.ncbi.nlm.nih.gov/blast/executables/legacy.NOTSUPPORTED/2.2.26/

ADD REPLY
0
Entering edit mode

Thank you so much for your reply.

I resolved the problem by using formatdb. But after running the program for 3 hours for a single protein. It gives me the error:

Running PSI-BLAST with sequence /Users/shalini/Desktop/pgen_github/unmodelled_fasta/AAA18895.fasta ...

[blastpgp 2.2.26] WARNING:  [000.000]  Failed to access string index ISAM Error code is -5

[blastpgp 2.2.26] WARNING:  [000.000]  Failed to retrieve sequence lcl|UniRef100_T1G0H5
[blastpgp 2.2.26] WARNING:  [000.000]  Failed to access string index ISAM Error code is -5

[blastpgp 2.2.26] WARNING:  [000.000]  Failed to retrieve sequence lcl|UniRef100_T1G0H5
blastpgp(1042,0xa986a1c0) malloc: *** mach_vm_map(size=8388608) failed (error code=3)
*** error: can't allocate region
*** set a breakpoint in malloc_error_break to debug
[blastpgp 2.2.26] FATAL ERROR: CoreLib [001.000]  Failed to allocate 40100 bytes
FATAL: Error whilst running blastpgp - script terminated!
ADD REPLY
0
Entering edit mode

I am using macOS High Sierra, processor is 3.3 GHz Intel Core i7 and the memory is 16 GB 1867 MHz DDR3.

Looking forward for your reply. I need your help.

ADD REPLY
0
Entering edit mode

I am not an expert on BLAST error messages, but it could be that you don't have enough memory. I suggest you try UniRef90 database as it is considerably smaller.

ADD REPLY

Login before adding your answer.

Traffic: 1502 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6