Question: No alias or index file found for protein database
0
gravatar for williamsbrian5064
11 weeks ago by
williamsbrian5064170 wrote:

I am trying to download provean to run on the cluster at my university.

However, I am running into an issue. When I run the example to test if everything is working correctly

I get the following error.

BLAST Database error: No alias or index file found for protein database [BLAST_DB=/path/to/provean/provean-1.1.5/nr] 
in search path [/path/to/provean/provean-1.1.5::]

I saw that a lot of people had a similar issue but I am having a unique issue because I am not directly using blast but provean uses blasts

So when I first tried downloading the database they used in their initial paper, the first thing I noticed was that were was no alias or index file in this database

ls
nr.00.phd  nr.00.pog  nr.01.phd  nr.01.pog  nr.02.phd  nr.02.pog  nr.03.phd  nr.03.pog  nr.04.phd  nr.04.pog  nr.05.phd  nr.05.pog  nr.pal
nr.00.phi  nr.00.ppd  nr.01.phi  nr.01.ppd  nr.02.phi  nr.02.ppd  nr.03.phi  nr.03.ppd  nr.04.phi  nr.04.ppd  nr.05.phi  nr.05.ppd
nr.00.phr  nr.00.ppi  nr.01.phr  nr.01.ppi  nr.02.phr  nr.02.ppi  nr.03.phr  nr.03.ppi  nr.04.phr  nr.04.ppi  nr.05.phr  nr.05.ppi
nr.00.pin  nr.00.psd  nr.01.pin  nr.01.psd  nr.02.pin  nr.02.psd  nr.03.pin  nr.03.psd  nr.04.pin  nr.04.psd  nr.05.pin  nr.05.psd
nr.00.pnd  nr.00.psi  nr.01.pnd  nr.01.psi  nr.02.pnd  nr.02.psi  nr.03.pnd  nr.03.psi  nr.04.pnd  nr.04.psi  nr.05.pnd  nr.05.psi
nr.00.pni  nr.00.psq  nr.01.pni  nr.01.psq  nr.02.pni  nr.02.psq  nr.03.pni  nr.03.psq  nr.04.pni  nr.04.psq  nr.05.pni  nr.05.psq

I am a new user to blast and wasn't sure how to make an index, but I saw that another piece of software that I used had a database that I could use

ls 
nr     nr.00.psi  nr.01.psd  nr.02.pni  nr.03.pnd  nr.04.pin  nr.05.phr  nr.05.psq  nr.06.psi  nr.07.psd  nr.08.pni  nr.09.pnd
nr.00.phr  nr.00.psq  nr.01.psi  nr.02.psd  nr.03.pni  nr.04.pnd  nr.05.pin  nr.06.phr  nr.06.psq  nr.07.psi  nr.08.psd  nr.09.pni
nr.00.pin  nr.01.phr  nr.01.psq  nr.02.psi  nr.03.psd  nr.04.pni  nr.05.pnd  nr.06.pin  nr.07.phr  nr.07.psq  nr.08.psi  nr.09.psd
nr.00.pnd  nr.01.pin  nr.02.phr  nr.02.psq  nr.03.psi  nr.04.psd  nr.05.pni  nr.06.pnd  nr.07.pin  nr.08.phr  nr.08.psq  nr.09.psi
nr.00.pni  nr.01.pnd  nr.02.pin  nr.03.phr  nr.03.psq  nr.04.psi  nr.05.psd  nr.06.pni  nr.07.pnd  nr.08.pin  nr.09.phr  nr.09.psq
nr.00.psd  nr.01.pni  nr.02.pnd  nr.03.pin  nr.04.phr  nr.04.psq  nr.05.psi  nr.06.psd  nr.07.pni  nr.08.pnd  nr.09.pin  nr.pal

You can see the index now nr or I believe that is the index but there are fewer files now :(

Here is how I am configuring provean

./configure PSIBLAST=/path/to/psiblast CDHIT=/path/to/cd-hit BLASTDBCMD=/path/to/blastdbcmd BLAST_DB=/path/to/provean/provean-1.1.5/nr --prefix=/path/provean/provean-1.1.5
make
make install

When I run the example

/path/to/provean.sh -q /path/to/examples/P04637.fasta -v /path/to/examples/P04637.var --save_supporting_set /path/to/examples/P04637.sss

I get the error above

BLAST Database error: No alias or index file found for protein database [BLAST_DB=/path/to/provean/provean-1.1.5/nr] 
in search path [/path/to/provean/provean-1.1.5::]

I know I am making some kind of mistake but I am not sure where :(

The files could be stored in the wrong place?

ADD COMMENTlink modified 28 days ago by Biostar ♦♦ 20 • written 11 weeks ago by williamsbrian5064170
1
  • Can you clarify which version of BLAST program are you using? It is possible that provean is looking for a version of BLAST that is of a similar vintage (from 2011).
  • Is BLAST is included in provean or you downloaded it separately?
  • Does your cluster have a new version of BLAST that may be in your $PATH that is causing the error (if provean came with required older blast)?

I assume /path/to placeholders refer to real paths on your cluster.

ADD REPLYlink modified 11 weeks ago • written 11 weeks ago by genomax68k
  • I am using this version of blast ncbi-blast-2.8.1+. I am not sure if it is an issue with the version of blast. I emailed them about the software not too long ago and they didn't mention any issues with the version of blast.
  • blast needs to be downloaded separately.
  • I have never used provean on the cluster so that shouldn't be an issue. Someone else may have a copy of blast but I don't think that would be an issue
  • that is correct about the /path/to. I can add the full path if that makes it easier. Some of the lines were getting a bit long
ADD REPLYlink written 11 weeks ago by williamsbrian5064170
1

Like I had said in last thread trying to use a much newer version of BLAST with indexes from 2011 (ftp://ftp.jcvi.org/pub/data/provean/nr_Aug_2011/ ) may be one of the problems.

I assume provean has not been updated since 2011 so if it can use latest blast+ and a latest set of indexes is somewhat doubtful. It sounds like you are doing the right thing but getting that error.

One solution may be to find an old version of blast (from around 2011) and give it a whirl with old indexes. It just depends on how desperate you are to run provean. It may be easier to find a newer solution.

ADD REPLYlink modified 11 weeks ago • written 11 weeks ago by genomax68k

Using a new version of BLAST could be the issue. I may try giving an older version of BLAST a try. I am glad that it seems like I am doing everything right so far. I've been picking people brains on this site for a bit now, I feel like I am picking up on things now.

I am just to get a couple of SNP predictors to run on my cluster to analyze thousands of SNPs. I don't know if you have any suggestions for newer alternatives?

ADD REPLYlink written 11 weeks ago by williamsbrian5064170
1

How about using snpEff or variant effect predictor then? You have bacterial samples?

ADD REPLYlink modified 11 weeks ago • written 11 weeks ago by genomax68k

Those are good and I use both of them. VEP has the option to run sift, which will analyze your SNPs for you. It will just give the SNP a score to determine if it is predicted to be damaging or benign. I just wanted to get maybe one more predictor running so I was looking into provean. I am working with canine samples

ADD REPLYlink written 11 weeks ago by williamsbrian5064170
1

Annovar (http://annovar.openbioinformatics.org/en/latest/ ).

ADD REPLYlink written 11 weeks ago by genomax68k

That's is a good one! I don't think they do the prediction for dogs. When I was looking at it, it looked like it was just for humans. I was honestly struggling to figure out what other species they support other than the ones they list on the homepage.

ADD REPLYlink written 11 weeks ago by williamsbrian5064170
1

is this BLAST_DB=BLAST_DB=/path/to/provean/provean-1.1.5/nr a typo or is the what you actually used in your command-line?

ADD REPLYlink written 11 weeks ago by lieven.sterck5.4k
1

See last bullet. That is a place holder: C: No alias or index file found for protein database

ADD REPLYlink written 11 weeks ago by genomax68k

Sorry, that was also a typo. Great catch though!!

ADD REPLYlink written 11 weeks ago by williamsbrian5064170
1

Can you set BLAST_DB to the top level directory where nr files are and see if that helps? See below. You may need to make clean and make again.

BLAST_DB=/path/to/provean/provean-1.1.5/.

ADD REPLYlink modified 11 weeks ago • written 11 weeks ago by genomax68k

I had a similar thought and tried that but no luck :(

ADD REPLYlink written 11 weeks ago by williamsbrian5064170
1

Was worth a try 😉. Good to hear you have already tried it.

ADD REPLYlink written 11 weeks ago by genomax68k

Do you think it is worth making my own database? I am just not sure what they used to make the nr database? I guess they used the human genome. Do you know much about this?

ADD REPLYlink written 11 weeks ago by williamsbrian5064170
1

It does not look like they are doing anything special (see this page). Just the latest nr direct from NCBI should work along with latest blast+. Have you tried downloading the latest nr indexes from NCBI? nr database is now significantly larger so be aware of that.

Since provean web page does not have a time stamp, difficult to know when it was last updated.

ADD REPLYlink modified 11 weeks ago • written 11 weeks ago by genomax68k

Back when I talk to the people who are "incharge" of provean, they sent me the latet, it was still from 2014-2015 so still sort of old.

I did look at the indexes and I did see how big they are now. That is why I wanted to give the database they listed because it was much smaller

ADD REPLYlink written 11 weeks ago by williamsbrian5064170
1

See if your cluster has latest nr. On shared compute admins may download them regularly so there are no multiple copies. Otherwise bite the bullet and get the new indexes.

I would normally say to stick with @lieven's suggestion below but the database provean provides are just too old.

ADD REPLYlink modified 11 weeks ago • written 11 weeks ago by genomax68k

I think I might have to bit the bullet : /

If it was on the cluster someone would have downloaded it themselves and hidden it away.

ADD REPLYlink written 11 weeks ago by williamsbrian5064170
1

What genomax is pointing to is there might be a shared location (== where the whole group or department or ... has access to and use the same blastDB indices).

I would suggest to first get this working before downloading the newest version of the BlastBDs (those are huge and will likely take you a few hours (day?) to download. On the other hand, you might need them anyway so yes perhaps download them in parallel.

ADD REPLYlink modified 11 weeks ago • written 11 weeks ago by lieven.sterck5.4k

Yeah, it doesn't look like there is one or at least it isn't documented that there is one.

Good point though. Downloading the newest version may not help if I can get the software working in the first place. I will see if I can track down the version that was used when they tested the software.

ADD REPLYlink written 11 weeks ago by williamsbrian5064170
1

is your nr DB actually located in that /path/to/provean/provean-1.1.5/ folder? and accessible?

ADD REPLYlink written 11 weeks ago by lieven.sterck5.4k

Yes, that is correct. I originally had it in its own folder but moved it to /path/to/provean/provean-1.1.5/ because I thought maybe the software was looking for the index in /path/to/provean/provean-1.1.5/ but wasn't find the index

ADD REPLYlink written 11 weeks ago by williamsbrian5064170
1

on a side note: the first ls you did of the nr DB folder looks OK! it should not specifically contain a file called nr . Blast will look for the nr.pal file or to the basename of all the *p.. files .

I would personally stick to the DB version that comes with the tool and not (yet) change it with a different versioned DB.

EDIT: stick to the provided one until this issue is resolved and then of course try to update (as pointed out by genomax , the DB that comes with the tool is way outdated in the meanwhile)

ADD REPLYlink modified 11 weeks ago • written 11 weeks ago by lieven.sterck5.4k

Oh, I didn't know that. Thanks for letting me know. That is good to know!

ADD REPLYlink written 11 weeks ago by williamsbrian5064170
1

long shot, but try to set this to the provean path thing: BLASTDB=/path/to/ ( no underscore in it)

ADD REPLYlink written 11 weeks ago by lieven.sterck5.4k

No luck : / still getting the same thing

ADD REPLYlink written 11 weeks ago by williamsbrian5064170
1

If it's not too long, can you post the content of the provean.sh file you are using?

ADD REPLYlink written 11 weeks ago by lieven.sterck5.4k
 #!/bin/bash

####################
# CONFIGURATION
####################
# Specify the path to database and program
#
BLAST_DB="BLAST_DB=/gpfs_common/share01/kmmeurs/provean/"
PSIBLAST="/gpfs_common/share01/kmmeurs/provean/ncbi-blast-2.2.25+/bin/psiblast"
CD_HIT="/gpfs_common/share01/kmmeurs/provean/cd-hit-v4.8.1-2019-0228/cd-hit"
BLASTDBCMD="/gpfs_common/share01/kmmeurs/provean/ncbi-blast-2.2.25+/bin/blastdbcmd"
# END CONFIGURATION
####################



shopt -s -o nounset
SCRIPT="provean.sh"
SCRIPT_DIR=$(readlink -f $0)
SCRIPT_DIR=${SCRIPT_DIR%/*}

if [ -z "$BLAST_DB" ] ; then
    echo "error: BLAST database name is missing. Please edit provean.sh file to add the name."
    exit 1;
fi

if [ -z "$PSIBLAST" ] ; then
    echo "error: psiblast path is missing. Please edit provean.sh file to add the path."
    exit 1
fi

if [ -z "$CD_HIT" ] ; then
    echo "error: cd-hit path is missing. Please edit provean.sh file to add the path."
    exit 1
fi

if [ -z "$BLASTDBCMD" ] ; then
    echo "error: blastdbcmd path is missing. Please edit provean.sh file to add the path."
    exit 1
fi

QUERY=
VARIATION=
QUIET="--quiet"
SSS=
SAVE_SSS=
VERBOSE=
NUM_THREADS=
TMP_DIR=

# check getopt mode
getopt -T
if [ $? -ne 4 ] ; then 
    echo "error: Requires enhanced getopt, obtain new version."
    exit 1;
fi

OPTSTRING="q:v:Vh"
LOPTSTRING="query:,variation:,save_supporting_set:,supporting_set:,num_threads:,tmp_dir:,verbose,help"
USAGE="PROVEAN v1.1.5

USAGE:
  provean.sh [Options]

Example:
 # Given a query sequence in aaa.fasta file, 
 # compute scores for variations in bbb.var file 
 provean.sh -q aaa.fasta -v bbb.var

Required arguments:
 -q <string>, --query <string>
   Query protein sequence filename in fasta format
 -v <string>, --variation <string>
   Variation filename containing a list of variations:
     one entry per line in HGVS notation,
     e.g.: G105C, F508del, Q49dup, Q49_P50insC, Q49_R52delinsLI

Optional arguments:
 --save_supporting_set <string>
   Saves supporting sequence set infomation into a given filename
 --supporting_set <string>
   Supporting sequence set filename saved with '--save_supporting_set' option above
   (This will save time for BLAST search and clustering.)
 --tmp_dir <string>
   Temporary directory used to store temporary files
 --num_threads <integer>
   Number of threads (CPUs) to use in BLAST search
 -V, --verbose
   Verbosely shows the information about procedure
 -h, --help
   Gives this help message
"

RESULT=$(getopt -n "$SCRIPT" -o "$OPTSTRING" -l "$LOPTSTRING" -- "$@")
if [ $? -ne 0 ] ; then
    # parsing error, show usage
    echo "$USAGE" 
    exit 1
fi

eval set -- "$RESULT"
while [ true ] ; do
    case "$1" in
        -q|--query) 
            shift 
            QUERY="$1"
        ;;
        -v|--variation)
            shift
            VARIATION="$1"
        ;;
        -V|--verbose)
            QUIET=""
        ;;
        --supporting_set)
            shift
            SSS="$1"
        ;;
        --save_supporting_set)
            shift
            SAVE_SSS="$1"
        ;;
        --tmp_dir)
            shift
            TMP_DIR="$1"
        ;;
        --num_threads)
            shift
            NUM_THREADS="$1"
        ;;
        -h|--help)
            echo "$USAGE"
            exit 0
        ;;
        --)
            shift
            break
        ;;
    esac
    shift
done

if [ -z "$QUERY" ] ; then
    echo "error: need query sequence filename" 
    exit 1
fi

if [ -z "$VARIATION" ] ; then
    echo "error: need variation filename"
    exit 1
fi

COMMAND="$SCRIPT_DIR/provean -q $QUERY -v $VARIATION -d $BLAST_DB --psiblast $PSIBLAST --cdhit $CD_HIT --blastdbcmd $BLASTDBCMD $QUIET"

if [ -n "$SAVE_SSS" ] ; then
    COMMAND="$COMMAND --save_supporting_set $SAVE_SSS"
fi

if [ -n "$SSS" ] ; then
    COMMAND="$COMMAND --supporting_set $SSS"
fi

if [ -n "$TMP_DIR" ]; then
    COMMAND="$COMMAND --tmp_dir $TMP_DIR"
fi

if [ -n "$NUM_THREADS" ]; then
    COMMAND="$COMMAND --num_threads $NUM_THREADS"
fi

# run command
$COMMAND

STATUS=$?

exit $STATUS

So ignore the path to BLAST_DB, that was me just trying to see if I could get anything to work. I think I am going to try and just run blast and see if that is working. I was thinking if I cant even get blast by itself, then maybe that is the underlying issue here.

ADD REPLYlink modified 10 weeks ago • written 10 weeks ago by williamsbrian5064170
1
PSIBLAST="/gpfs_common/share01/kmmeurs/provean/ncbi-blast-2.2.25+/bin/psiblast"

If that is a real path then the app is configured to use a really OLD version of blast.

ADD REPLYlink modified 10 weeks ago • written 10 weeks ago by genomax68k

That is a great catch! Remember we talked about using the version of BLAST that was used in their original paper with their original 2011 database? That is why you see that old version. Actually, I get the same error as I was before so I am starting to think I am making some kind of error. I didn't get a chance to play with this today but I'm going to look into it this weekend.

ADD REPLYlink written 10 weeks ago by williamsbrian5064170

When you do work on this again, make sure those configuration variables point cohesively to either new or old set of blast executables/databases. No mixing allowed :-)

ADD REPLYlink modified 10 weeks ago • written 10 weeks ago by genomax68k

Sorry, that is a bit long. I got most of it but I had to cut out some of the useless stuff like the copy right

ADD REPLYlink written 10 weeks ago by williamsbrian5064170

looking at it (and without knowing what exactly provean does with it) I got the impression that BLAST_DB="BLAST_DB=/gpfs_common/share01/kmmeurs/provean/" needs to point to the blastDB itself, so you will have to add nr to it indeed. (but then we're back to square 1 , right :/ )

ADD REPLYlink modified 10 weeks ago • written 10 weeks ago by lieven.sterck5.4k

perhaps we need to uncouple the blast thing from the provean thing all together.

Can you run a blast against that nr DB simply from the command line? (just take some small protein file and do somehting like blastp -in <file> -db /gpfs_common/share01/kmmeurs/provean/nr )

ADD REPLYlink written 10 weeks ago by lieven.sterck5.4k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1442 users visited in the last hour