No alias or index file found for protein database
0
0
Entering edit mode
3.8 years ago

I am trying to download provean to run on the cluster at my university.

However, I am running into an issue. When I run the example to test if everything is working correctly

I get the following error.

BLAST Database error: No alias or index file found for protein database [BLAST_DB=/path/to/provean/provean-1.1.5/nr]
in search path [/path/to/provean/provean-1.1.5::]


I saw that a lot of people had a similar issue but I am having a unique issue because I am not directly using blast but provean uses blasts

So when I first tried downloading the database they used in their initial paper, the first thing I noticed was that were was no alias or index file in this database

ls
nr.00.phd  nr.00.pog  nr.01.phd  nr.01.pog  nr.02.phd  nr.02.pog  nr.03.phd  nr.03.pog  nr.04.phd  nr.04.pog  nr.05.phd  nr.05.pog  nr.pal
nr.00.phi  nr.00.ppd  nr.01.phi  nr.01.ppd  nr.02.phi  nr.02.ppd  nr.03.phi  nr.03.ppd  nr.04.phi  nr.04.ppd  nr.05.phi  nr.05.ppd
nr.00.phr  nr.00.ppi  nr.01.phr  nr.01.ppi  nr.02.phr  nr.02.ppi  nr.03.phr  nr.03.ppi  nr.04.phr  nr.04.ppi  nr.05.phr  nr.05.ppi
nr.00.pin  nr.00.psd  nr.01.pin  nr.01.psd  nr.02.pin  nr.02.psd  nr.03.pin  nr.03.psd  nr.04.pin  nr.04.psd  nr.05.pin  nr.05.psd
nr.00.pnd  nr.00.psi  nr.01.pnd  nr.01.psi  nr.02.pnd  nr.02.psi  nr.03.pnd  nr.03.psi  nr.04.pnd  nr.04.psi  nr.05.pnd  nr.05.psi
nr.00.pni  nr.00.psq  nr.01.pni  nr.01.psq  nr.02.pni  nr.02.psq  nr.03.pni  nr.03.psq  nr.04.pni  nr.04.psq  nr.05.pni  nr.05.psq


I am a new user to blast and wasn't sure how to make an index, but I saw that another piece of software that I used had a database that I could use

ls
nr     nr.00.psi  nr.01.psd  nr.02.pni  nr.03.pnd  nr.04.pin  nr.05.phr  nr.05.psq  nr.06.psi  nr.07.psd  nr.08.pni  nr.09.pnd
nr.00.phr  nr.00.psq  nr.01.psi  nr.02.psd  nr.03.pni  nr.04.pnd  nr.05.pin  nr.06.phr  nr.06.psq  nr.07.psi  nr.08.psd  nr.09.pni
nr.00.pin  nr.01.phr  nr.01.psq  nr.02.psi  nr.03.psd  nr.04.pni  nr.05.pnd  nr.06.pin  nr.07.phr  nr.07.psq  nr.08.psi  nr.09.psd
nr.00.pnd  nr.01.pin  nr.02.phr  nr.02.psq  nr.03.psi  nr.04.psd  nr.05.pni  nr.06.pnd  nr.07.pin  nr.08.phr  nr.08.psq  nr.09.psi
nr.00.pni  nr.01.pnd  nr.02.pin  nr.03.phr  nr.03.psq  nr.04.psi  nr.05.psd  nr.06.pni  nr.07.pnd  nr.08.pin  nr.09.phr  nr.09.psq
nr.00.psd  nr.01.pni  nr.02.pnd  nr.03.pin  nr.04.phr  nr.04.psq  nr.05.psi  nr.06.psd  nr.07.pni  nr.08.pnd  nr.09.pin  nr.pal


You can see the index now nr or I believe that is the index but there are fewer files now :(

Here is how I am configuring provean

./configure PSIBLAST=/path/to/psiblast CDHIT=/path/to/cd-hit BLASTDBCMD=/path/to/blastdbcmd BLAST_DB=/path/to/provean/provean-1.1.5/nr --prefix=/path/provean/provean-1.1.5
make
make install


When I run the example

/path/to/provean.sh -q /path/to/examples/P04637.fasta -v /path/to/examples/P04637.var --save_supporting_set /path/to/examples/P04637.sss


I get the error above

BLAST Database error: No alias or index file found for protein database [BLAST_DB=/path/to/provean/provean-1.1.5/nr]
in search path [/path/to/provean/provean-1.1.5::]


I know I am making some kind of mistake but I am not sure where :(

The files could be stored in the wrong place?

next-gen software error sequence • 3.1k views
1
Entering edit mode
• Can you clarify which version of BLAST program are you using? It is possible that provean is looking for a version of BLAST that is of a similar vintage (from 2011).
• Is BLAST is included in provean or you downloaded it separately?
• Does your cluster have a new version of BLAST that may be in your $PATH that is causing the error (if provean came with required older blast)? I assume /path/to placeholders refer to real paths on your cluster. ADD REPLY 0 Entering edit mode • I am using this version of blast ncbi-blast-2.8.1+. I am not sure if it is an issue with the version of blast. I emailed them about the software not too long ago and they didn't mention any issues with the version of blast. • blast needs to be downloaded separately. • I have never used provean on the cluster so that shouldn't be an issue. Someone else may have a copy of blast but I don't think that would be an issue • that is correct about the /path/to. I can add the full path if that makes it easier. Some of the lines were getting a bit long ADD REPLY 1 Entering edit mode Like I had said in last thread trying to use a much newer version of BLAST with indexes from 2011 (ftp://ftp.jcvi.org/pub/data/provean/nr_Aug_2011/ ) may be one of the problems. I assume provean has not been updated since 2011 so if it can use latest blast+ and a latest set of indexes is somewhat doubtful. It sounds like you are doing the right thing but getting that error. One solution may be to find an old version of blast (from around 2011) and give it a whirl with old indexes. It just depends on how desperate you are to run provean. It may be easier to find a newer solution. ADD REPLY 0 Entering edit mode Using a new version of BLAST could be the issue. I may try giving an older version of BLAST a try. I am glad that it seems like I am doing everything right so far. I've been picking people brains on this site for a bit now, I feel like I am picking up on things now. I am just to get a couple of SNP predictors to run on my cluster to analyze thousands of SNPs. I don't know if you have any suggestions for newer alternatives? ADD REPLY 1 Entering edit mode How about using snpEff or variant effect predictor then? You have bacterial samples? ADD REPLY 0 Entering edit mode Those are good and I use both of them. VEP has the option to run sift, which will analyze your SNPs for you. It will just give the SNP a score to determine if it is predicted to be damaging or benign. I just wanted to get maybe one more predictor running so I was looking into provean. I am working with canine samples ADD REPLY 1 Entering edit mode ADD REPLY 0 Entering edit mode That's is a good one! I don't think they do the prediction for dogs. When I was looking at it, it looked like it was just for humans. I was honestly struggling to figure out what other species they support other than the ones they list on the homepage. ADD REPLY 1 Entering edit mode is this BLAST_DB=BLAST_DB=/path/to/provean/provean-1.1.5/nr a typo or is the what you actually used in your command-line? ADD REPLY 1 Entering edit mode See last bullet. That is a place holder: C: No alias or index file found for protein database ADD REPLY 0 Entering edit mode Sorry, that was also a typo. Great catch though!! ADD REPLY 1 Entering edit mode Can you set BLAST_DB to the top level directory where nr files are and see if that helps? See below. You may need to make clean and make again. BLAST_DB=/path/to/provean/provean-1.1.5/. ADD REPLY 0 Entering edit mode I had a similar thought and tried that but no luck :( ADD REPLY 1 Entering edit mode Was worth a try 😉. Good to hear you have already tried it. ADD REPLY 0 Entering edit mode Do you think it is worth making my own database? I am just not sure what they used to make the nr database? I guess they used the human genome. Do you know much about this? ADD REPLY 1 Entering edit mode It does not look like they are doing anything special (see this page). Just the latest nr direct from NCBI should work along with latest blast+. Have you tried downloading the latest nr indexes from NCBI? nr database is now significantly larger so be aware of that. Since provean web page does not have a time stamp, difficult to know when it was last updated. ADD REPLY 0 Entering edit mode Back when I talk to the people who are "incharge" of provean, they sent me the latet, it was still from 2014-2015 so still sort of old. I did look at the indexes and I did see how big they are now. That is why I wanted to give the database they listed because it was much smaller ADD REPLY 1 Entering edit mode See if your cluster has latest nr. On shared compute admins may download them regularly so there are no multiple copies. Otherwise bite the bullet and get the new indexes. I would normally say to stick with @lieven's suggestion below but the database provean provides are just too old. ADD REPLY 0 Entering edit mode I think I might have to bit the bullet : / If it was on the cluster someone would have downloaded it themselves and hidden it away. ADD REPLY 1 Entering edit mode What genomax is pointing to is there might be a shared location (== where the whole group or department or ... has access to and use the same blastDB indices). I would suggest to first get this working before downloading the newest version of the BlastBDs (those are huge and will likely take you a few hours (day?) to download. On the other hand, you might need them anyway so yes perhaps download them in parallel. ADD REPLY 0 Entering edit mode Yeah, it doesn't look like there is one or at least it isn't documented that there is one. Good point though. Downloading the newest version may not help if I can get the software working in the first place. I will see if I can track down the version that was used when they tested the software. ADD REPLY 1 Entering edit mode is your nr DB actually located in that /path/to/provean/provean-1.1.5/ folder? and accessible? ADD REPLY 0 Entering edit mode Yes, that is correct. I originally had it in its own folder but moved it to /path/to/provean/provean-1.1.5/ because I thought maybe the software was looking for the index in /path/to/provean/provean-1.1.5/ but wasn't find the index ADD REPLY 1 Entering edit mode on a side note: the first ls you did of the nr DB folder looks OK! it should not specifically contain a file called nr . Blast will look for the nr.pal file or to the basename of all the *p.. files . I would personally stick to the DB version that comes with the tool and not (yet) change it with a different versioned DB. EDIT: stick to the provided one until this issue is resolved and then of course try to update (as pointed out by genomax , the DB that comes with the tool is way outdated in the meanwhile) ADD REPLY 0 Entering edit mode Oh, I didn't know that. Thanks for letting me know. That is good to know! ADD REPLY 1 Entering edit mode long shot, but try to set this to the provean path thing: BLASTDB=/path/to/ ( no underscore in it) ADD REPLY 0 Entering edit mode No luck : / still getting the same thing ADD REPLY 1 Entering edit mode If it's not too long, can you post the content of the provean.sh file you are using? ADD REPLY 0 Entering edit mode  #!/bin/bash #################### # CONFIGURATION #################### # Specify the path to database and program # BLAST_DB="BLAST_DB=/gpfs_common/share01/kmmeurs/provean/" PSIBLAST="/gpfs_common/share01/kmmeurs/provean/ncbi-blast-2.2.25+/bin/psiblast" CD_HIT="/gpfs_common/share01/kmmeurs/provean/cd-hit-v4.8.1-2019-0228/cd-hit" BLASTDBCMD="/gpfs_common/share01/kmmeurs/provean/ncbi-blast-2.2.25+/bin/blastdbcmd" # END CONFIGURATION #################### shopt -s -o nounset SCRIPT="provean.sh" SCRIPT_DIR=$(readlink -f $0) SCRIPT_DIR=${SCRIPT_DIR%/*}

if [ -z "$BLAST_DB" ] ; then echo "error: BLAST database name is missing. Please edit provean.sh file to add the name." exit 1; fi if [ -z "$PSIBLAST" ] ; then
echo "error: psiblast path is missing. Please edit provean.sh file to add the path."
exit 1
fi

if [ -z "$CD_HIT" ] ; then echo "error: cd-hit path is missing. Please edit provean.sh file to add the path." exit 1 fi if [ -z "$BLASTDBCMD" ] ; then
echo "error: blastdbcmd path is missing. Please edit provean.sh file to add the path."
exit 1
fi

QUERY=
VARIATION=
QUIET="--quiet"
SSS=
SAVE_SSS=
VERBOSE=
TMP_DIR=

# check getopt mode
getopt -T
if [ $? -ne 4 ] ; then echo "error: Requires enhanced getopt, obtain new version." exit 1; fi OPTSTRING="q:v:Vh" LOPTSTRING="query:,variation:,save_supporting_set:,supporting_set:,num_threads:,tmp_dir:,verbose,help" USAGE="PROVEAN v1.1.5 USAGE: provean.sh [Options] Example: # Given a query sequence in aaa.fasta file, # compute scores for variations in bbb.var file provean.sh -q aaa.fasta -v bbb.var Required arguments: -q <string>, --query <string> Query protein sequence filename in fasta format -v <string>, --variation <string> Variation filename containing a list of variations: one entry per line in HGVS notation, e.g.: G105C, F508del, Q49dup, Q49_P50insC, Q49_R52delinsLI Optional arguments: --save_supporting_set <string> Saves supporting sequence set infomation into a given filename --supporting_set <string> Supporting sequence set filename saved with '--save_supporting_set' option above (This will save time for BLAST search and clustering.) --tmp_dir <string> Temporary directory used to store temporary files --num_threads <integer> Number of threads (CPUs) to use in BLAST search -V, --verbose Verbosely shows the information about procedure -h, --help Gives this help message " RESULT=$(getopt -n "$SCRIPT" -o "$OPTSTRING" -l "$LOPTSTRING" -- "$@")
if [ $? -ne 0 ] ; then # parsing error, show usage echo "$USAGE"
exit 1
fi

eval set -- "$RESULT" while [ true ] ; do case "$1" in
-q|--query)
shift
QUERY="$1" ;; -v|--variation) shift VARIATION="$1"
;;
-V|--verbose)
QUIET=""
;;
--supporting_set)
shift
SSS="$1" ;; --save_supporting_set) shift SAVE_SSS="$1"
;;
--tmp_dir)
shift
TMP_DIR="$1" ;; --num_threads) shift NUM_THREADS="$1"
;;
-h|--help)
echo "$USAGE" exit 0 ;; --) shift break ;; esac shift done if [ -z "$QUERY" ] ; then
echo "error: need query sequence filename"
exit 1
fi

if [ -z "$VARIATION" ] ; then echo "error: need variation filename" exit 1 fi COMMAND="$SCRIPT_DIR/provean -q $QUERY -v$VARIATION -d $BLAST_DB --psiblast$PSIBLAST --cdhit $CD_HIT --blastdbcmd$BLASTDBCMD $QUIET" if [ -n "$SAVE_SSS" ] ; then
COMMAND="$COMMAND --save_supporting_set$SAVE_SSS"
fi

if [ -n "$SSS" ] ; then COMMAND="$COMMAND --supporting_set $SSS" fi if [ -n "$TMP_DIR" ]; then
COMMAND="$COMMAND --tmp_dir$TMP_DIR"
fi

if [ -n "$NUM_THREADS" ]; then COMMAND="$COMMAND --num_threads $NUM_THREADS" fi # run command$COMMAND

STATUS=$? exit$STATUS


So ignore the path to BLAST_DB, that was me just trying to see if I could get anything to work. I think I am going to try and just run blast and see if that is working. I was thinking if I cant even get blast by itself, then maybe that is the underlying issue here.

1
Entering edit mode
PSIBLAST="/gpfs_common/share01/kmmeurs/provean/ncbi-blast-2.2.25+/bin/psiblast"


If that is a real path then the app is configured to use a really OLD version of blast.

0
Entering edit mode

That is a great catch! Remember we talked about using the version of BLAST that was used in their original paper with their original 2011 database? That is why you see that old version. Actually, I get the same error as I was before so I am starting to think I am making some kind of error. I didn't get a chance to play with this today but I'm going to look into it this weekend.

0
Entering edit mode

When you do work on this again, make sure those configuration variables point cohesively to either new or old set of blast executables/databases. No mixing allowed :-)

0
Entering edit mode

I added a comment below if you were curious about how I resolved this problem! Thank you so much for your help, I really appreciate the time you and lieven.sterck took to help me.

0
Entering edit mode

Sorry, that is a bit long. I got most of it but I had to cut out some of the useless stuff like the copy right

0
Entering edit mode

looking at it (and without knowing what exactly provean does with it) I got the impression that BLAST_DB="BLAST_DB=/gpfs_common/share01/kmmeurs/provean/" needs to point to the blastDB itself, so you will have to add nr to it indeed. (but then we're back to square 1 , right :/ )

0
Entering edit mode

perhaps we need to uncouple the blast thing from the provean thing all together.

Can you run a blast against that nr DB simply from the command line? (just take some small protein file and do somehting like blastp -in <file> -db /gpfs_common/share01/kmmeurs/provean/nr )

1
Entering edit mode

I am not sure the best way to reach the two of you. I just wanted to thank you two for your help. I was able to figure this problem out (I had to put this project on the back burner). So what I did was user conda to download BLAST and CD-HIT. I figured that would simplify things, I noticed that I was getting the same errors. So I did what you two suggested and revert back to what was used in the original paper. Provean suggests not using v4.6 or v4.6.1 of CD-HIT. Configuring older versions of CD-HIT were giving me issues related to my version of gcc, so I figured it would be easier to play with the version of BLAST that I was using. I started with using 2.2.25, which is what they originally used with Provean. This sorta worked! but the software would crash and it was related to the version of BLAST I was using. I was having a similar issue to this post. I saw that if you use version 2.2.31, the problem would be solved. After updating to version 2.2.31 everything worked!!!!! So exciting! I am not entirely sure why the newest version of BLAST doesn't work but it seems that using version 2.2.31 of BLAST and newer version of CD-HIT (avoiding v4.6 or v4.6.1) solves the problem. The version of CD-HIT that I am using is v4.8.1.The command that I ended up using was

./configure PSIBLAST=/path/ncbi-blast-2.2.31+/bin/psiblast BLASTDBCMD=/path/ncbi-blast-2.2.31+/bin/blastdbcmd CDHIT=/path/anaconda.2.7/bin/cd-hit BLAST_DB=/path/provean-1.1.5/nr_Aug_2011/nr --prefix=/path/provean-1.1.5/