Question: Downloading And Maintaining A Local, Blast-Able Nr Database
15
gravatar for Anjan
9.6 years ago by
Anjan820
United States
Anjan820 wrote:

I am planning to set up and maintain a local version of the NR and other NCBI databases, for running in-house BLAST searches. I would also like to my local version of the databases be in synch with NCBI through regular updates. NCBI suggests using the update_blastdb.pl (http://www.ncbi.nlm.nih.gov/blast/docs/update_blastdb.pl) to download the latest versions of all the pre-formatted databases. Does anyone have experiences to share on using this script? Are there alternative solutions? Will appreciate everyone's feedback. Thanks, Anjan

ADD COMMENTlink modified 4.3 years ago by conchoecia0 • written 9.6 years ago by Anjan820
2

It's fine to constantly update the blast databases, but then you need to document which release you used when you did your analysis, right? With a new database, you might get slightly different hits...

ADD REPLYlink written 9.6 years ago by lexnederbragt1.2k
1

ah yes, i believe running fastacmd with a -I option returns the version of the database. so fastacmd -d $HOME/blastdb/nt -I returns the version of the nt databases. This output can easily be tacked onto the end of a blast report to keep track of the database version.

ADD REPLYlink written 9.6 years ago by Anjan820

+1 @flxlex: Agree. Was think the first thing any script should do is get and log the version database used before pulling data.

ADD REPLYlink written 9.6 years ago by Blunders1.1k

+1 @flxlex: Agree, the first thing any script should do is get and log the version database used before pulling data; or for that matter any data source.

ADD REPLYlink written 9.6 years ago by Blunders1.1k

I like the idea of maintaining a log of updates. The script does not create one. However it should not be difficult to start a log.

ADD REPLYlink written 9.6 years ago by Anjan820

@Anjan: The idea is not to log the updates, store the version of BLAST the result were created with in the result set, or at least this is what I meant. If you have any questions, just comment again. Cheers!

ADD REPLYlink written 9.6 years ago by Blunders1.1k

[EDIT] @Anjan: The idea is not to log the updates when installed, but to store the version of BLAST used to produce the results with result data created. If you have any questions, just comment again. Cheers!

ADD REPLYlink written 9.6 years ago by Blunders1.1k

+1 @Anjan: Cool, thanks for posting the command-lines, and glad you were able to figure out what I was trying to say. Cheers!

ADD REPLYlink written 9.6 years ago by Blunders1.1k

You may want to try this: http://www.dnabaser.com/download/NCBI-BLAST-downloader/

ADD REPLYlink written 5.6 years ago by BioApps740
9
gravatar for Neilfws
9.6 years ago by
Neilfws49k
Sydney, Australia
Neilfws49k wrote:

NCBI used to provide a method for incremental update of local databases. Its disadvantage was that the local and remote copies diverged over time. It looks like they've abandoned this approach with the new update script.

The update_blastdb.pl script looks fine. All it does is download the pre-formatted BLAST databases, if the local copies are either absent or older than the remote copies. I would just give it a try; if it's not to your liking, it's easy to implement something similar using any scripting language.

You should also decide how often you want to check for updates: daily, weekly, monthly? - and set up a cron job to automate the process. Here's one tutorial, or else just search the web for "cron tutorial".

ADD COMMENTlink written 9.6 years ago by Neilfws49k
1

It will download the pre-formatted database files to whichever directory you specify. That should be all the "installation" required. When running BLAST, you either specify the path to the database files or define it in a configuration file.

ADD REPLYlink written 9.6 years ago by Neilfws49k

+1 @neilfws: Much more relevant answer, one question though -- does the update_blastdb.pl file install the updates, or just download them?

ADD REPLYlink written 9.6 years ago by Blunders1.1k

+1 @neilfws: Thanks for the clarification.

ADD REPLYlink written 9.6 years ago by Blunders1.1k

no installation required. however you have to untar+unzip the files and get rid of the zip files. none of this is done by the script. again not a difficult task to code.

ADD REPLYlink written 9.6 years ago by Anjan820

I wonder what is the prefer way to deal with blast jobs running at the time of the scheduled update_blastdb.pl run?

ADD REPLYlink written 8.0 years ago by Carlos Borroto1.9k

I have added a loop in my perl script that checks the list of running jobs for any active blast runs. If any blast jobs are detected the script goes to sleep for 2 minutes, reawakes and resamples the jobs list. Here is the code snippet:

while(){ #Use top to get a snapshot of processes that are running.If a BLAST job is running, sleep for 120s, resample top.
        my $status= `top -b -n1`;
        if ($status=~ /blastall|blast/){
            sleep(120);
             next;
        }
        else{
     last;
        }

HTH

ADD REPLYlink modified 13 months ago by RamRS30k • written 8.0 years ago by Anjan820
2
gravatar for Blunders
9.6 years ago by
Blunders1.1k
Blunders1.1k wrote:

Possible you've seen these pages, but since you didn't link to them I'm posting them:

As for the sync, I'd suggest finding a way to monitor this page, and get an email alert on updates (since I was unable to find an email alert for updates): ftp://ftp.ncbi.nlm.nih.gov/blast/executables/LATEST/ChangeLog

Upon getting an email alert, I'd manually review the updates posted - then do a build if needed.

ADD COMMENTlink written 9.6 years ago by Blunders1.1k

The update_blastdb.pl script checks whether the remote files are newer than the local; I don't think email alert is necessary.

ADD REPLYlink written 9.6 years ago by Neilfws49k
2
gravatar for Jan Kosinski
9.6 years ago by
Jan Kosinski1.6k
Jan Kosinski1.6k wrote:

You may also try this: http://dunbrack.fccc.edu/BioDownloader/BioDownloader.php

I have never tried it, as it runs only under Windows, but perhaps you can run it using Wine on linux.

I also used update_blastdb.pl with success as following (put as a shell script in crontab)

echo "downloading nr"
cd /home2/db/blast; nice -n +15 ./update_blastdb.pl --passive --timeout 300 --force --verbose nr &> nr.updatedb.log
echo 'untaring nr'
tar -xzvf nr.00.tar.gz &>nr.00.tar.log
tar -xzvf nr.01.tar.gz &>nr.01.tar.log
tar -xzvf nr.02.tar.gz &>nr.02.tar.log
tar -xzvf nr.03.tar.gz &>nr.03.tar.log

rm nr.00.tar.gz &>nr.00.rm.log
rm nr.01.tar.gz &>nr.01.rm.log
rm nr.02.tar.gz &>nr.02.rm.log
rm nr.03.tar.gz &>nr.03.rm.log
ADD COMMENTlink written 9.6 years ago by Jan Kosinski1.6k

Thank you Jan, this is the most complete solution I have come across. You even have a log!

ADD REPLYlink written 9.6 years ago by Anjan820

No probs, but keep in mind that if new nr.03.tar.gz will appear, it will be downloaded, but not extracted. So perhaps it would be better to embed it into some 'for' shell loop. I was checking it manually (it does not happen so ofter), and adding new lines if necessary ;-)

You may try: for file in nr.??.tar.gz; do tar -zxvf $file &> $file.tar.log; rm $file &> $file.rm.log; done

(not tested, there maybe typos)

ADD REPLYlink written 9.6 years ago by Jan Kosinski1.6k

Sorry, I meant "new nr.04.tar.gz".

ADD REPLYlink written 9.6 years ago by Jan Kosinski1.6k
2
gravatar for ahmed_abdullah
4.6 years ago by
Cairo
ahmed_abdullah20 wrote:

I had the same problem two days ago, and what I did is to

  1. First Install NCBI Blast on your OS.
  2. Second download this file to update your local database"update_blastdb.pl".
  3. Finally download the database using the following command line:

    $ perl update_blastdb.pl --passive nt

ADD COMMENTlink written 4.6 years ago by ahmed_abdullah20
0
gravatar for Adrian Pelin
6.8 years ago by
Adrian Pelin2.4k
Canada
Adrian Pelin2.4k wrote:

Okay, how can we know when was the pre-formatted database updated? For instance, is there a change log file where NCBI states when their version of nr/nt on their ftp website was last updated?

ADD COMMENTlink written 6.8 years ago by Adrian Pelin2.4k

AFAIK, NCBI does a weekly release of data every Monday. HTH.

ADD REPLYlink written 6.8 years ago by Anjan820

Salut Adrian. I wrote a tool that will run at computer start up and check if the local databases are old. The v2 of this tool will be available by the end of tomorrow.

http://www.dnabaser.com/download/NCBI-BLAST-downloader/

ADD REPLYlink written 5.6 years ago by BioApps740

This looks really comfortable:) unfortunately, I can only use it on my home PC which runs MS Win, I will try and leave feedback, can I contact you here http://www.dnabaser.com/download/nextgen-fastq-editor/contact.html ? As always I must recommend you release source:) or at least port it to java so that it's OS independent.

ADD REPLYlink written 5.6 years ago by Adrian Pelin2.4k

Hi Adrian. Yes, that's the good link for contacting me.

About the port: the program was written in Delphi. Some months ago I just upgraded my license to Delphi 21 which can build for Win, OS X, iOS and Android (and I think some other platforms too but not for Linux). I am still fiddling around to see how this works :) So, there will be a Mac port quite soon. The Linux support will come when Delphi will support it. But since bioinformaticians are exclusively on Linux and they DON'T need my tool, Linux is not a priority anyway. 

 

ADD REPLYlink written 5.6 years ago by BioApps740
0
gravatar for conchoecia
4.3 years ago by
conchoecia0
Santa Cruz, CA
conchoecia0 wrote:

I made a script that checks if there is a blast job currently running, waits until it is done, deletes the old dbs, then downloads the new ones and moves them into the same directory name. The script outputs everything into a dated log for archival purposes. Anyone have any suggestions for improvements?

I made this into a cron job by typing

crontab -e

...and adding this line.

0 3 1 1,4,7,10 * * /<your directory to>/<the script and update_blastdb.pl>/update_blast.sh

This line above sets up a cron job that will run the script every January, April, July, and October 1st, at 3AM. So you get quarterly updates!

#!/bin/bash

# Name this file "update_blast.sh" and put it in the same directory as your 
# "update_blastdb.pl" file. The nr datablast will be saved to "nr/", and the taxdb
# will be saved to "taxdb/" in the same directory. Run this script via the
# terminal or via a cron job.

# cron format http://www.nncron.ru/help/EN/working/cron-format.htm
# http://askubuntu.com/questions/2368/how-do-i-set-up-a-cron-job

#change the directory of $PWD to directory of script
cd "$(dirname "$0")"

#condiditional shell scripts: http://askubuntu.com/questions/157779
#bash wait for process to start

#define a timestamp function
# Define a timestamp function
timestamplong() {
  date +"%Y%m%d_%H-%M-%S" 
}

timestampshort() {
  date +"%Y%m%d"
}

logfile="$(timestampshort)_blastupdate.log"

#wait until blast is done to start

echo "# $(timestamplong) Log file created. Attempting to update blast and taxdb." >> $logfile 2>&1
echo "# $(timestamplong) Waiting until blast processes are done before continuing." >> $logfile 2>&1

while ps aux | grep ' /bin/blast' | grep -v 'grep' > /dev/null
do
    sleep 1
done

echo "# $(timestamplong) Blast processes complete. Proceeding with download." >> $logfile 2>&1


echo "# $(timestamplong) Deleting current nr and taxdb databases." >> $logfile 2>&1
rm -rf nr/ 
rm -rf taxdb/ 

echo "# $(timestamplong) Starting updateblast script." >> $logfile 2>&1
perl update_blastdb.pl --verbose --decompress nr taxdb >> $logfile 2>&1

echo "# $(timestamplong) Moving the taxdb and nr databases to nr/ and taxdb/" >> $logfile 2>&1
mkdir taxdb
mv taxdb.* taxdb
mkdir nr
mv nr.* nr

echo "# $(timestamplong) Update complete! blastdb and taxdb are now the most recent versions." >> $logfile 2>&1
ADD COMMENTlink written 4.3 years ago by conchoecia0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1655 users visited in the last hour