I am trying to download provean to run on the cluster at my university.
However, I am running into an issue. When I run the example to test if everything is working correctly
I get the following error.
BLAST Database error: No alias or index file found for protein database [BLAST_DB=/path/to/provean/provean-1.1.5/nr]
in search path [/path/to/provean/provean-1.1.5::]
I saw that a lot of people had a similar issue but I am having a unique issue because I am not directly using blast but provean uses blasts
So when I first tried downloading the database they used in their initial paper, the first thing I noticed was that were was no alias or index file in this database
ls
nr.00.phd nr.00.pog nr.01.phd nr.01.pog nr.02.phd nr.02.pog nr.03.phd nr.03.pog nr.04.phd nr.04.pog nr.05.phd nr.05.pog nr.pal
nr.00.phi nr.00.ppd nr.01.phi nr.01.ppd nr.02.phi nr.02.ppd nr.03.phi nr.03.ppd nr.04.phi nr.04.ppd nr.05.phi nr.05.ppd
nr.00.phr nr.00.ppi nr.01.phr nr.01.ppi nr.02.phr nr.02.ppi nr.03.phr nr.03.ppi nr.04.phr nr.04.ppi nr.05.phr nr.05.ppi
nr.00.pin nr.00.psd nr.01.pin nr.01.psd nr.02.pin nr.02.psd nr.03.pin nr.03.psd nr.04.pin nr.04.psd nr.05.pin nr.05.psd
nr.00.pnd nr.00.psi nr.01.pnd nr.01.psi nr.02.pnd nr.02.psi nr.03.pnd nr.03.psi nr.04.pnd nr.04.psi nr.05.pnd nr.05.psi
nr.00.pni nr.00.psq nr.01.pni nr.01.psq nr.02.pni nr.02.psq nr.03.pni nr.03.psq nr.04.pni nr.04.psq nr.05.pni nr.05.psq
I am a new user to blast and wasn't sure how to make an index, but I saw that another piece of software that I used had a database that I could use
ls
nr nr.00.psi nr.01.psd nr.02.pni nr.03.pnd nr.04.pin nr.05.phr nr.05.psq nr.06.psi nr.07.psd nr.08.pni nr.09.pnd
nr.00.phr nr.00.psq nr.01.psi nr.02.psd nr.03.pni nr.04.pnd nr.05.pin nr.06.phr nr.06.psq nr.07.psi nr.08.psd nr.09.pni
nr.00.pin nr.01.phr nr.01.psq nr.02.psi nr.03.psd nr.04.pni nr.05.pnd nr.06.pin nr.07.phr nr.07.psq nr.08.psi nr.09.psd
nr.00.pnd nr.01.pin nr.02.phr nr.02.psq nr.03.psi nr.04.psd nr.05.pni nr.06.pnd nr.07.pin nr.08.phr nr.08.psq nr.09.psi
nr.00.pni nr.01.pnd nr.02.pin nr.03.phr nr.03.psq nr.04.psi nr.05.psd nr.06.pni nr.07.pnd nr.08.pin nr.09.phr nr.09.psq
nr.00.psd nr.01.pni nr.02.pnd nr.03.pin nr.04.phr nr.04.psq nr.05.psi nr.06.psd nr.07.pni nr.08.pnd nr.09.pin nr.pal
You can see the index now nr
or I believe that is the index but there are fewer files now :(
Here is how I am configuring provean
./configure PSIBLAST=/path/to/psiblast CDHIT=/path/to/cd-hit BLASTDBCMD=/path/to/blastdbcmd BLAST_DB=/path/to/provean/provean-1.1.5/nr --prefix=/path/provean/provean-1.1.5
make
make install
When I run the example
/path/to/provean.sh -q /path/to/examples/P04637.fasta -v /path/to/examples/P04637.var --save_supporting_set /path/to/examples/P04637.sss
I get the error above
BLAST Database error: No alias or index file found for protein database [BLAST_DB=/path/to/provean/provean-1.1.5/nr]
in search path [/path/to/provean/provean-1.1.5::]
I know I am making some kind of mistake but I am not sure where :(
The files could be stored in the wrong place?
provean
is looking for a version ofBLAST
that is of a similar vintage (from 2011).provean
or you downloaded it separately?$PATH
that is causing the error (ifprovean
came with required olderblast
)?I assume
/path/to
placeholders refer to real paths on your cluster.ncbi-blast-2.8.1+
. I am not sure if it is an issue with the version of blast. I emailed them about the software not too long ago and they didn't mention any issues with the version of blast./path/to
. I can add the full path if that makes it easier. Some of the lines were getting a bit longLike I had said in last thread trying to use a much newer version of BLAST with indexes from 2011 (ftp://ftp.jcvi.org/pub/data/provean/nr_Aug_2011/ ) may be one of the problems.
I assume
provean
has not been updated since 2011 so if it can use latestblast+
and a latest set of indexes is somewhat doubtful. It sounds like you are doing the right thing but getting that error.One solution may be to find an old version of blast (from around 2011) and give it a whirl with old indexes. It just depends on how desperate you are to run
provean
. It may be easier to find a newer solution.Using a new version of BLAST could be the issue. I may try giving an older version of BLAST a try. I am glad that it seems like I am doing everything right so far. I've been picking people brains on this site for a bit now, I feel like I am picking up on things now.
I am just to get a couple of SNP predictors to run on my cluster to analyze thousands of SNPs. I don't know if you have any suggestions for newer alternatives?
How about using
snpEff
or variant effect predictor then? You have bacterial samples?Those are good and I use both of them. VEP has the option to run sift, which will analyze your SNPs for you. It will just give the SNP a score to determine if it is predicted to be damaging or benign. I just wanted to get maybe one more predictor running so I was looking into provean. I am working with canine samples
Annovar (http://annovar.openbioinformatics.org/en/latest/ ).
That's is a good one! I don't think they do the prediction for dogs. When I was looking at it, it looked like it was just for humans. I was honestly struggling to figure out what other species they support other than the ones they list on the homepage.
is this
BLAST_DB=BLAST_DB=/path/to/provean/provean-1.1.5/nr
a typo or is the what you actually used in your command-line?See last bullet. That is a place holder: C: No alias or index file found for protein database
Sorry, that was also a typo. Great catch though!!
Can you set BLAST_DB to the top level directory where
nr
files are and see if that helps? See below. You may need tomake clean
andmake
again.BLAST_DB=/path/to/provean/provean-1.1.5/
.I had a similar thought and tried that but no luck :(
Was worth a try 😉. Good to hear you have already tried it.
Do you think it is worth making my own database? I am just not sure what they used to make the
nr
database? I guess they used the human genome. Do you know much about this?It does not look like they are doing anything special (see this page). Just the latest
nr
direct from NCBI should work along with latestblast+
. Have you tried downloading the latestnr
indexes from NCBI?nr
database is now significantly larger so be aware of that.Since provean web page does not have a time stamp, difficult to know when it was last updated.
Back when I talk to the people who are "incharge" of provean, they sent me the latet, it was still from 2014-2015 so still sort of old.
I did look at the indexes and I did see how big they are now. That is why I wanted to give the database they listed because it was much smaller
See if your cluster has latest
nr
. On shared compute admins may download them regularly so there are no multiple copies. Otherwise bite the bullet and get the new indexes.I would normally say to stick with @lieven's suggestion below but the database provean provides are just too old.
I think I might have to bit the bullet : /
If it was on the cluster someone would have downloaded it themselves and hidden it away.
What genomax is pointing to is there might be a shared location (== where the whole group or department or ... has access to and use the same blastDB indices).
I would suggest to first get this working before downloading the newest version of the BlastBDs (those are huge and will likely take you a few hours (day?) to download. On the other hand, you might need them anyway so yes perhaps download them in parallel.
Yeah, it doesn't look like there is one or at least it isn't documented that there is one.
Good point though. Downloading the newest version may not help if I can get the software working in the first place. I will see if I can track down the version that was used when they tested the software.
is your nr DB actually located in that
/path/to/provean/provean-1.1.5/
folder? and accessible?Yes, that is correct. I originally had it in its own folder but moved it to
/path/to/provean/provean-1.1.5/
because I thought maybe the software was looking for the index in/path/to/provean/provean-1.1.5/
but wasn't find the indexon a side note: the first
ls
you did of the nr DB folder looks OK! it should not specifically contain a file callednr
. Blast will look for thenr.pal
file or to thebasename of all the *p..
files .I would personally stick to the DB version that comes with the tool and not (yet) change it with a different versioned DB.
EDIT: stick to the provided one until this issue is resolved and then of course try to update (as pointed out by genomax , the DB that comes with the tool is way outdated in the meanwhile)
Oh, I didn't know that. Thanks for letting me know. That is good to know!
long shot, but try to set this to the provean path thing:
BLASTDB=/path/to/
( no underscore in it)No luck : / still getting the same thing
If it's not too long, can you post the content of the provean.sh file you are using?
So ignore the path to BLAST_DB, that was me just trying to see if I could get anything to work. I think I am going to try and just run blast and see if that is working. I was thinking if I cant even get blast by itself, then maybe that is the underlying issue here.
If that is a real path then the app is configured to use a really OLD version of blast.
That is a great catch! Remember we talked about using the version of BLAST that was used in their original paper with their original 2011 database? That is why you see that old version. Actually, I get the same error as I was before so I am starting to think I am making some kind of error. I didn't get a chance to play with this today but I'm going to look into it this weekend.
When you do work on this again, make sure those configuration variables point cohesively to either new or old set of blast executables/databases. No mixing allowed :-)
I added a comment below if you were curious about how I resolved this problem! Thank you so much for your help, I really appreciate the time you and lieven.sterck took to help me.
Sorry, that is a bit long. I got most of it but I had to cut out some of the useless stuff like the copy right
looking at it (and without knowing what exactly provean does with it) I got the impression that
BLAST_DB="BLAST_DB=/gpfs_common/share01/kmmeurs/provean/"
needs to point to the blastDB itself, so you will have to addnr
to it indeed. (but then we're back to square 1 , right :/ )perhaps we need to uncouple the blast thing from the provean thing all together.
Can you run a blast against that nr DB simply from the command line? (just take some small protein file and do somehting like
blastp -in <file> -db /gpfs_common/share01/kmmeurs/provean/nr
)I am not sure the best way to reach the two of you. I just wanted to thank you two for your help. I was able to figure this problem out (I had to put this project on the back burner). So what I did was user
conda
to download BLAST and CD-HIT. I figured that would simplify things, I noticed that I was getting the same errors. So I did what you two suggested and revert back to what was used in the original paper. Provean suggests not using v4.6 or v4.6.1 of CD-HIT. Configuring older versions of CD-HIT were giving me issues related to my version of gcc, so I figured it would be easier to play with the version of BLAST that I was using. I started with using 2.2.25, which is what they originally used with Provean. This sorta worked! but the software would crash and it was related to the version of BLAST I was using. I was having a similar issue to this post. I saw that if you use version 2.2.31, the problem would be solved. After updating to version 2.2.31 everything worked!!!!! So exciting! I am not entirely sure why the newest version of BLAST doesn't work but it seems that using version 2.2.31 of BLAST and newer version of CD-HIT (avoiding v4.6 or v4.6.1) solves the problem. The version of CD-HIT that I am using is v4.8.1.The command that I ended up using was