Reproducible research demands that we can rerun any analysis and get the same results. But how should we deal with this when analyses depend on databases that are frequently updated?
We are currently facing the problem that we have implemented a daily automatic BLAST db update, but we would like to be able to roll-back the database to any given date.
I can see three ways how this could be achieved:
1) In the best case this would be part of the blast command-line tool suite, for exapmle by specifying with a
--date flag up to which date the database should be queried. But to my knowledge, this functionality doesn't exist.
2) An easy solution would be to keep copies of the database. But this is obviously really resource intensive and doesn't seem a good solution, especially if frequent updates occur.
3) So the only reasonable solution would be to have some sort of versioning of the database. This would of course mean that if you want to rerun an analysis, you would have to copy the database first to a certain roll-back point and run the analysis from there.
Do you have experience with this or any suggestions of tools which could provide a sensible solution for versioning the BLAST database? Or am I missing something and this functionality already exists?
Any input or discussion would be appreciated.