Bioinformatics: How To Version Control Small Scripts Located All Over The Server.
3
5
Entering edit mode
8.3 years ago

I have plenty of small scripts I want to version control but they are scattered all over the server. Have you found a clever way to deal with this?

bioinformatics • 5.7k views
14
Entering edit mode
8.3 years ago

put the scripts in a central (github|svn|...) repository

2
Entering edit mode

Here are some examples:

0
Entering edit mode

It seems I then need to store all the files in a central folder like /allscripts and then if I need to use one of the scripts in a specific place like /scriptneeded/ I place a symlink in /scriptneeded/ to /allscripts. Gotcha. Git only stores symlinks not the files they point to (at least since version 1.6 according to SO.)

0
Entering edit mode

I suggested the opposite solution: the real files are stored in git while the symbolic links are located in your workspaces.

0
Entering edit mode

Yeah, obviously- I just misunderstood at first.

4
Entering edit mode
8.3 years ago
Hamish ★ 3.2k

The options that come to mind are:

• Use RCS. For local version control over individual files RCS is a reasonable fit.
• To use a single repository for all the files, with CVS, Subversion, git, etc., either:
• Base your repository on the common shared parent directory, and only add and commit the required files and directories, using the 'ignore' capabilities of your version control software to keep it form nagging about stuff you don't want to version.
• Map the files into a simplified directory tree using symlinks and version this (as suggested by Pierre Lindenbaum)
• Consolidate the files into a simple tree for versioning, and either:
• Use an installation script to push them to their final locations
• Build a package for your platform (e.g. RPM or deb) to install the files to the appropriate locations.
• Consolidate the files into a more version control friendly layout. In most cases your scripts should not care too much where they are, as long as their parameters allow for the specification of appropriate paths, either directly or through configuration files.
0
Entering edit mode

Git submodules could also be on this list to version-control everything, but with some logical separation of projects. And git, itself, can be used in a situation where lots of files in the tree are meant to be ignored, such as what I describe here (where I wanted to ignore a bunch of large files): http://watson.nci.nih.gov/~sdavis/blog/git_for_projects_with_large_files/

3
Entering edit mode
8.3 years ago

I'm going to suggest something a bit different from a tangle of symbolic links, something a bit more general, which is to use Modules to deploy versions of a given package, which could come from a folder on your file system or a specific branch revision off a git repo, etc. In particular, this can be a very useful and clean way to manage scripts or programs with varied binary and version dependencies, a frequent problem in bioinformatics.

2
Entering edit mode
0
Entering edit mode

I agree here. Modules is used extensive in the NIH Biowulf cluster. It is a very simple, light-weight, and general solution and has the added benefit of being somewhat self-describing. Bundling a template Module file with your scripts is a great way to version the whole mess and allow for easy deployment.

0
Entering edit mode

Could you elaborate a bit on the

template Module file with your scripts is a great way to version the whole mess and allow for easy deployment

part? An example, perhaps?

0
Entering edit mode

By template modulefile, I mean a module file that can be modified to suit the end-user's needs, but contains most of the necessary boilerplate to achieve the functionality necessary. In practice, you might want to include a modulefile that does not include a hard-coded path so that folks can adjust it to their own environment.

0
Entering edit mode

Got it. Thanks!