Question: wrong version of formatdb was used to make database
0
gravatar for shalinikaushik1293
5 months ago by
shalinikaushik12930 wrote:

I am using uniref100 database for running the psipred script (runpsipred) on macOS. On running the runpsipred script :

#!/bin/tcsh

# This is a simple script which will carry out all of the basic steps
# required to make a PSIPRED prediction. Note that it assumes that the
# following programs are in the appropriate directories:
# blastpgp - PSIBLAST executable (from NCBI toolkit)
# makemat - IMPALA utility (from NCBI toolkit)
# psipred - PSIPRED V4 program
# psipass2 - PSIPRED V4 program

# NOTE: Script modified to be more cluster friendly (DTJ April 2008)

# The name of the BLAST data bank
set dbname = /Users/shalini/Desktop/pgen_github/uniref100/uniref100.fasta

# Where the NCBI programs have been installed
set ncbidir = /Users/shalini/Desktop/pgen_github/ncbi-blast-2.10.0+/bin

# Where the PSIPRED V4 programs have been installed
set execdir = /Users/shalini/Desktop/pgen_github/pGenTHREADER-master/psipred/bin

# Where the PSIPRED V4 data files have been installed
set datadir = /Users/shalini/Desktop/pgen_github/pGenTHREADER-master/psipred/data

set basename = $1:r
set rootname = $basename:t

# Generate a "unique" temporary filename root
set hostid = `hostid`
set tmproot = psitmp$$$hostid

\cp -f $1 $tmproot.fasta

echo "Running PSI-BLAST with sequence" $1 "..."

$ncbidir/blastpgp -b 0 -v 5000 -j 3 -h 0.001 -d $dbname -i $tmproot.fasta -C $tmproot.chk >& $tmproot.blast

if ($status != 0) then
    tail $tmproot.blast
    echo "FATAL: Error whilst running blastpgp - script terminated!"
    exit $status
endif

echo "Predicting secondary structure..."

echo $tmproot.chk > $tmproot.pn
echo $tmproot.fasta > $tmproot.sn

$ncbidir/makemat -P $tmproot

if ($status != 0) then
    echo "FATAL: Error whilst running makemat - script terminated!"
    exit $status
endif

echo Pass1 ...

$execdir/psipred $tmproot.mtx $datadir/weights.dat $datadir/weights.dat2 $datadir/weights.dat3 > $rootname.ss

if ($status != 0) then
    echo "FATAL: Error whilst running psipred - script terminated!"
    exit $status
endif

echo Pass2 ...

$execdir/psipass2 $datadir/weights_p2.dat 1 1.0 1.0 $rootname.ss2 $rootname.ss > $rootname.horiz

if ($status != 0) then
    echo "FATAL: Error whilst running psipass2 - script terminated!"
    exit $status
endif

# Remove temporary files

echo Cleaning up ...
\rm -f $tmproot.* error.log

echo "Final output files:" $rootname.ss2 $rootname.horiz
echo "Finished."

The error is

Running PSI-BLAST with sequence /Users/shalini/Desktop/pgen_github/unmodelled_fasta/NP_999840.fasta ...
[NULL_Caption] WARNING: readdb: wrong version of formatdb was used to make database //Users/shalini/Desktop/pgen_github/uniref100/uniref100.fasta.65.
[NULL_Caption] WARNING: readdb: wrong version of formatdb was used to make database //Users/shalini/Desktop/pgen_github/uniref100/uniref100.fasta.66.
[NULL_Caption] WARNING: readdb: wrong version of formatdb was used to make database //Users/shalini/Desktop/pgen_github/uniref100/uniref100.fasta.67.
[NULL_Caption] WARNING: readdb: wrong version of formatdb was used to make database //Users/shalini/Desktop/pgen_github/uniref100/uniref100.fasta.68.
[NULL_Caption] WARNING: readdb: wrong version of formatdb was used to make database //Users/shalini/Desktop/pgen_github/uniref100/uniref100.fasta.69.
[NULL_Caption] WARNING: readdb: wrong version of formatdb was used to make database //Users/shalini/Desktop/pgen_github/uniref100/uniref100.fasta.70.
[NULL_Caption] WARNING: readdb: wrong version of formatdb was used to make database //Users/shalini/Desktop/pgen_github/uniref100/uniref100.fasta.71.
[NULL_Caption] WARNING: readdb: wrong version of formatdb was used to make database //Users/shalini/Desktop/pgen_github/uniref100/uniref100.fasta.72.
[NULL_Caption] WARNING: readdb: wrong version of formatdb was used to make database //Users/shalini/Desktop/pgen_github/uniref100/uniref100.fasta.73.
[NULL_Caption] WARNING: Unable to open uniref100.fasta.pin
FATAL: Error whilst running blastpgp - script terminated!

I have made the database using

makeblastdb -in /Users/shalini/Desktop/pgen_github/uniref100/uniref100.fasta -parse_seqids -blastdb_version 5 -title "Unirefdb" -dbtype prot

results in :

Building a new DB, current time: 01/14/2020 15:45:24
New DB name:   /Users/shalini/Desktop/pgen_github/uniref100/uniref100.fasta
New DB title:  Unirefdb
Sequence type: Protein
Keep MBits: T
Maximum file size: 1000000000B
Adding sequences from FASTA; added 199397329 sequences in 7009.84 seconds.

There are total 73 .psq, .pin, .pog and .phr files generated

formatdb runpsipred blastpgp • 347 views
ADD COMMENTlink modified 5 months ago by Mensur Dlakic5.8k • written 5 months ago by shalinikaushik12930

where did you get the blastpgp executable from? ( as far as I know it's not part of the default blast package) and consequently which version are you using?

perhaps try formatting your DB using the older -blastdb_version 4 . In any case when formatting large DB you should also get one .pal (or .nal for nucleotides) files which 'groups' all the subparts.

ADD REPLYlink written 5 months ago by lieven.sterck7.9k
0
gravatar for Mensur Dlakic
5 months ago by
Mensur Dlakic5.8k
USA
Mensur Dlakic5.8k wrote:

Most likely you need to format your protein database with formatdb that matches your version of blastpgp. Based on your blast programs location, you format the database like this:

/Users/shalini/Desktop/pgen_github/ncbi-blast-2.10.0+/bin/formatdb -i /Users/shalini/Desktop/pgen_github/uniref100/uniref100.fasta -p T -t "Unirefdb"

You may want to first delete previously formatted files.

ADD COMMENTlink written 5 months ago by Mensur Dlakic5.8k
1

formatdb is been deprecated for years now, so I don't think it will be available from the 2.10.0+ version

ADD REPLYlink written 5 months ago by lieven.sterck7.9k

Right you are about formatdb being deprecated. On the other hand, PSI-PRED still works best with legacy BLAST versions that have formatdb, so here is the link to last version that should work:

ftp://ftp.ncbi.nlm.nih.gov/blast/executables/legacy.NOTSUPPORTED/2.2.26/

ADD REPLYlink written 5 months ago by Mensur Dlakic5.8k

Thank you so much for your reply.

I resolved the problem by using formatdb. But after running the program for 3 hours for a single protein. It gives me the error:

Running PSI-BLAST with sequence /Users/shalini/Desktop/pgen_github/unmodelled_fasta/AAA18895.fasta ...

[blastpgp 2.2.26] WARNING:  [000.000]  Failed to access string index ISAM Error code is -5

[blastpgp 2.2.26] WARNING:  [000.000]  Failed to retrieve sequence lcl|UniRef100_T1G0H5
[blastpgp 2.2.26] WARNING:  [000.000]  Failed to access string index ISAM Error code is -5

[blastpgp 2.2.26] WARNING:  [000.000]  Failed to retrieve sequence lcl|UniRef100_T1G0H5
blastpgp(1042,0xa986a1c0) malloc: *** mach_vm_map(size=8388608) failed (error code=3)
*** error: can't allocate region
*** set a breakpoint in malloc_error_break to debug
[blastpgp 2.2.26] FATAL ERROR: CoreLib [001.000]  Failed to allocate 40100 bytes
FATAL: Error whilst running blastpgp - script terminated!
ADD REPLYlink written 5 months ago by shalinikaushik12930

I am using macOS High Sierra, processor is 3.3 GHz Intel Core i7 and the memory is 16 GB 1867 MHz DDR3.

Looking forward for your reply. I need your help.

ADD REPLYlink written 5 months ago by shalinikaushik12930

I am not an expert on BLAST error messages, but it could be that you don't have enough memory. I suggest you try UniRef90 database as it is considerably smaller.

ADD REPLYlink written 5 months ago by Mensur Dlakic5.8k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 658 users visited in the last hour