Local Mirror Of Biological Databases
Entering edit mode
12.6 years ago
pufferfish ▴ 290

I'd like to set up a local mirror of certain large databases like the nt BLAST database, interpro etc.

The biomirror project looks like a good candidate, but they seem to advocate using GridFTP, and have even deprecated rsync. I would have thought a simpler solution would be something hacked together with cron and rsync, or am I missing something?

So, my question is: What solutions have you used for mirroring large biological databases, and what mistakes should I avoid making?

blast database • 2.9k views
Entering edit mode
12.6 years ago
Torst ▴ 980

I just use a simple shell script in the system's cron.daily folder and use the "mirror" option of the "lftp" command. Here is one which mirrors just the virus and bacteria genomes into my local folder called "/bio/db/ncbigenomes/". You will have to adjust the SRC and DEST folders, and the $HOST variable to point to your local biomirror.

# sudo vi /etc/cron.daily/biomirror
for G in Viruses Plasmids Bacteria Bacteria_DRAFT ; do
        lftp -c "open ftp://$HOST/ ; mirror --delete \
          /biomirror/ncbigenomes/$G /bio/db/ncbigenomes/$G"
Entering edit mode
12.6 years ago
Pasta ★ 1.3k

We have a local blast Nt DB in our lab with proprietary sequences and mirror DB. I wrote a script for updating the DB on a monthly basis, it is launched by the cron. The script (written in PHP. Yes, I know ...) use E-utils functions (from NCBI) to query NCBI, parse XMLs and retrieve sequences of interest. This is quite fun to write and I have something that works and do exactly what I want.


Login before adding your answer.

Traffic: 1432 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6