Local Mirror Of Biological Databases
2
3
Entering edit mode
13.1 years ago
pufferfish ▴ 290

I'd like to set up a local mirror of certain large databases like the nt BLAST database, interpro etc.

The biomirror project looks like a good candidate, but they seem to advocate using GridFTP, and have even deprecated rsync. I would have thought a simpler solution would be something hacked together with cron and rsync, or am I missing something?

So, my question is: What solutions have you used for mirroring large biological databases, and what mistakes should I avoid making?

blast database • 3.0k views
ADD COMMENT
1
Entering edit mode
13.1 years ago
Torst ▴ 980

I just use a simple shell script in the system's cron.daily folder and use the "mirror" option of the "lftp" command. Here is one which mirrors just the virus and bacteria genomes into my local folder called "/bio/db/ncbigenomes/". You will have to adjust the SRC and DEST folders, and the $HOST variable to point to your local biomirror.

#!/bin/sh
#
# sudo vi /etc/cron.daily/biomirror
#
HOST=biomirror.aarnet.edu.au
for G in Viruses Plasmids Bacteria Bacteria_DRAFT ; do
        lftp -c "open ftp://$HOST/ ; mirror --delete \
          /biomirror/ncbigenomes/$G /bio/db/ncbigenomes/$G"
done
ADD COMMENT
0
Entering edit mode
13.1 years ago
Pasta ★ 1.3k

We have a local blast Nt DB in our lab with proprietary sequences and mirror DB. I wrote a script for updating the DB on a monthly basis, it is launched by the cron. The script (written in PHP. Yes, I know ...) use E-utils functions (from NCBI) to query NCBI, parse XMLs and retrieve sequences of interest. This is quite fun to write and I have something that works and do exactly what I want.

ADD COMMENT

Login before adding your answer.

Traffic: 2427 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6