Forum: Why is academic software hard to install?
6
gravatar for lh3
3.1 years ago by
lh330k
United States
lh330k wrote:

The top answer in What are the biggest challenges bioinformaticians have with data analysis? motivated me to ask this question. Personally, I have tried numerous software and also frequently have problems with installation. I have my own answers to this question, but I'd more like to hear others' opinions - What is the major obstacle in your experience? Is it biased towards particular programming languages? How do you think we can improve? Is Docker the savior? Do you have examples of easy-to-install tools you wish other tools could follow? I might write a blog post to give my view, leveraging your opinions.

forum installation software • 3.0k views
ADD COMMENTlink modified 3.1 years ago by SES8.0k • written 3.1 years ago by lh330k

Shouldn't this be Forum? It's a bit of an open-ended discussion trigger, some might reject the premise in this generalized form even. What's the evidence for the fact of being 'hard to install'? Is this statement meaningful at all or without a comparison: maybe it was meant "harder to install", then what to compare it with, commercial software, all open-source software? That the answer to the other question has been upvoted doesn't necessarily mean it is true.

ADD REPLYlink modified 3.1 years ago • written 3.1 years ago by Michael Dondrup43k

Changed to Forum.

ADD REPLYlink written 3.1 years ago by lh330k

I think you can quantify it, count the number of dependencies (and add up the lines of code) and the number of steps required to build a working version of the tool. Both of these factors will determine your time investment. Even if you feel it natural to compile large libraries like EMBOSS as a dependency of a program, wouldn't it be "easier" to just get the program with curl, if that were an option?

ADD REPLYlink written 3.1 years ago by SES8.0k

Maybe one reason is that academic bioinformatics software is often developed for Linux see also: http://www.psychocats.net/ubuntucat/software-installation-in-linux-is-difficult/

ADD REPLYlink written 3.1 years ago by Michael Dondrup43k

This is definitely one reason. On Windows/Mac, developers typically ship self-consistent binaries compatible across multiple versions of OS. On linux, it is recommended to compile from the source code due to the differences between distros especially in dynamic libraries. I think Linux does the wrong way. While some programs do benefit from compilation from source, a lot others can be shipped with cross-distro binaries.

ADD REPLYlink written 3.1 years ago by lh330k
9
gravatar for Istvan Albert
3.1 years ago by
Istvan Albert ♦♦ 74k
University Park, USA
Istvan Albert ♦♦ 74k wrote:

Academic software is hard to install because the ease of installation and usability are not required or rewarded directly.

When developing a commercial software the users can "reward" or "punish" the developers immediately and measurably. In a research paper all one needs to convince are three reviewers - in addition most of them may be experts and are not representative of the actual audience.

Recent experience: claim: "our software installs easily and seamlessly via cmake" - ok I have that - turns out CMAKE had to be above a certain version, but that new version only compiles if  GCC was above a certain version, that in turn required a complete reinstall of all GLIBC and a whole lot of other tools.

Adding insult to injury turns out an older CMAKE would work just as well, there was nothing there that would have made use of a newer feature of CMAKE, only that this was the version the developer happened to have on their machine so that was the version that got baked into the requirements. 

ADD COMMENTlink modified 3.1 years ago • written 3.1 years ago by Istvan Albert ♦♦ 74k
6

Yes! library versions are so often the culprit. I have Numpy version X but this tool wants Numpy version Y and Python version Z. They say I can get Python Z running under virtualEnv but numpy wants LAPACK and BLAS and before you know it I've spent an entire day trying to install a tool that claimed "our software installs easily and seamlessly via cmake"

ADD REPLYlink written 3.1 years ago by karl.stamm3.2k
1

This resembles my experiences.

ADD REPLYlink written 3.1 years ago by lh330k
1

But is commercial software guaranteed to be better? Here is something we ran into this very morning:

A scientists brought in Illumina data on a hard drive that happens to be in .bcl format similar to what the HiSeq currently produces. We do this type of conversion regularly but this time our pipeline failed with an error. We contacted Illumina support - (by the way they are always very responsive and helpful, kudos to them!) - turns out that we need to update our bcl2fastq converter from 1.8.2 to 1.8.4 but it is very important that we use exactly 1.8.4 and not the newest version of the converter that is actually called bcl2fastq2 and it has the version 2.15. We are then sent to this page:

http://support.illumina.com/downloads/bcl2fastq_conversion_software.html

The binary download for bcl2fastq.rpm clocks in at 774MB or we can get the Linux tarball at 884MB . Seven hundred megabytes for a file converter? Note the size of the bcl2fastq2 converter - it is just 2MB, but that's the program that won't work.

ADD REPLYlink modified 3.1 years ago • written 3.1 years ago by Istvan Albert ♦♦ 74k

Commercial software is not "guaranteed" to be better, but as a whole it tends to be better.

ADD REPLYlink written 3.1 years ago by lh330k

I think that many of these library annoyances would be discovered if tool authors did test-driven development on something like Travis-CI. The key is building the software "clean" each time you test, and with multiple different compiler/interpreter versions.

ADD REPLYlink written 3.1 years ago by Matt Shirley7.9k
3
gravatar for Giovanni M Dall'Olio
3.1 years ago by
London, UK
Giovanni M Dall'Olio25k wrote:

A company has to sell its produced software as a product. If the software is not easy to install and nobody uses it, then nobody buys it, and the company fails.

Academic software is usually aimed at the same persons who produces it. The software is released more as a way to improve reproducibility, than to get an economic gain from it.

ADD COMMENTlink written 3.1 years ago by Giovanni M Dall'Olio25k
3
gravatar for Devon Ryan
3.1 years ago by
Devon Ryan73k
Freiburg, Germany
Devon Ryan73k wrote:

My most frequent annoyance is getting python programs installed on the cluster where I don't have root permissions to get things like numpy installed. Then you end up needing to muck with PYTHONPATH and friends to get everything installed yourself. It's not difficult, but it's an annoyance (more so when the admin makes you remove things from drives shared across nodes when you're not actually running jobs). I should probably just write a script to automate this sort of thing.

ADD COMMENTlink written 3.1 years ago by Devon Ryan73k

I agree. While expecting GUI in academic software is not optimal, expecting software to have an all-things-handled user-install option should work, especially now that academic software environs are typically one of  local systems, HPCs and cloud VMs and on each, the only install-time limiting parameter is root privileges.

ADD REPLYlink written 3.1 years ago by Ram12k
1

this should be somewhat simplified if you use ananconda and or 'python setup.py install --user'

ADD REPLYlink written 3.1 years ago by brentp22k
1

Yes, or virtualenv.

ADD REPLYlink written 3.1 years ago by Matt Shirley7.9k

And the first instruction under the installation guidelines is: "sudo pip install virtualenv"

So, I don't see how this is any better. Maybe for development, but I don't see how this helps the casual user that just wants to use package X right now.

ADD REPLYlink written 3.1 years ago by SES8.0k

Emphasis on "should be". Especially on clusters, --user will often install things so they're only accessible to the login node. Python is still relatively convenient, though, compared to some other things.

ADD REPLYlink written 3.1 years ago by Devon Ryan73k

I also think python is relatively good in terms of installation. However, most python tools require installation. In the old days, I have used quite a few single perl scripts - no installation was needed. Installation-free is better than easy installation.

ADD REPLYlink written 3.1 years ago by lh330k

If $HOME is not NFS'd, then you can set $PYTHONUSERBASE to somewhere that is. Then install --user will send it there.

ADD REPLYlink written 3.1 years ago by brentp22k

Yeah, that's what I eventually remember that I need to do :)

ADD REPLYlink written 3.1 years ago by Devon Ryan73k

On the high performance machine, usually we don't have permission of root to install any tools. The better way is to make local directory which will mimic the root and install all essential packages there which are not available in root. Then during installation you can use --prefix to direct the path of package installed in local dir.

ADD REPLYlink written 3.1 years ago by Renesh1.1k
2
gravatar for Renesh
3.1 years ago by
Renesh1.1k
United States
Renesh1.1k wrote:

There are several reasons behind this; i can discuss few here

  1. Most of these written in Unix background and therefore you should have good knowledge of Unix
  2. The dependency of the software on other libraries. Most of these library are difficult to install and you should understand Unix OS for installing these
  3. The installation procedure is completely different from Windows
  4. Most of times you don't have permission to install in root directory and therefore, you forced to install it locally (Eg. On HPC). This is most difficult task for most of new bioinformatician.
  5. Most of software don't have GUI.
ADD COMMENTlink written 3.1 years ago by Renesh1.1k
1

As a bioinformatics person, we are supposed to be comfortable with the command line and understand that there is no "installation" in Unix. It might be a tad difficult to get out of a GUI preferring, walkthrough installer seeking mindset, but once those limitations are crossed, the entire world becomes uniformly simple.

ADD REPLYlink written 3.1 years ago by Ram12k
1

I understand the use of command line tools is difficult for the new bioinformatician. The apt-get is a just way to install the packages from the repository in Ubuntu. The large number of softwares are still not available on these repository and becomes difficult in case with Red Hat and Fedora. Therefore, we still need to install the tools manually using standard procedure in Unix (using make).

ADD REPLYlink written 3.1 years ago by Renesh1.1k

True, it is difficult for a new bioinformatician, but then, these skills are part of the profile, IMHO. And yes, dependency resolution is a huge pain, especially when recursive dependencies need to be resolved and in a version specific fashion. We have automated dependency resolvers for Mac and Debian based operating systems. Maybe we should look at something along those lines for RedHat/CentOS and Fedora?

ADD REPLYlink written 3.1 years ago by Ram12k
2

I would say that not having a GUI is a blessing. You can't automate with a GUI. You can't even pipe.

ADD REPLYlink written 3.1 years ago by Fotis T30

Couldn't agree more!

ADD REPLYlink written 3.1 years ago by Ram12k

Sure you can (see Apple Automator or NI LabVIEW, etc.) — but it is often easier or faster to use the command line.

ADD REPLYlink written 3.1 years ago by Alex Reynolds21k
2
gravatar for John
3.1 years ago by
John12k
Germany
John12k wrote:

When I program for myself:
- Pros: Client knows exactly what he wants, and can convey the problem to me reasonably well. Client is always lenient with deadlines.
- Cons: Client doesn't very pay well...

When I program for the Biotech industry:
- Pros: Client pays really well!
- Cons: Client often wants 'something just like X, but better'. Deadlines are often more important to the client than the product.

When I program for Academics:
- Pros: ???
- Cons: Client offended if you ask for payment. I feel bad for even asking. Client always prefers 'something just like what was used in publication X, but better'. Client doesn't mention deadlines to you, but Client may or may not suddenly publish a program just like yours without notice, at any moment.

Of course, I'm just joking. The real reason Academic programs are so hard to install/use is because writing code that is user friendly takes almost as much time as writing a novel algorithm in the first place. For the project I just finished/posted last week on Biostars (metaflagstat.py - painless bam/sam read flag counting), the programming time could be broken down like:

- 1 day writing the code (in python) that creates the flag-count matrix (the main 'novel' or useful feature of the program).
- 2 days writing the code that plots the interactive graphs (in Javascript, arguably the second 'novel' feature.)
- 2+ days fiddling with the HTML/CSS to make the plots look nice.
- 1 week cleaning up the above code, annotating it, swapping out non-core python packages like "requests" for clunky but core ones like "urllib"
- 2 days getting all the code hosted on the web, filming the how-to video, and posting about it on here.

So as you can see, I had a functioning program in just 3 days. I then spent over 10 days making it easy for others to use. That's 80% of the time I spent on the tool! :P

So bottom line is, if you want programmers who write these programs to make them more user-friendly, you have to accept that they'll probably only make 20% of the stuff that they used to make, because such a large proportion of their time is spent on 'User Experience' stuff like the installer, code-annotation, well written documentation, user interface, automatic update-checking, etc.

ADD COMMENTlink modified 3.1 years ago • written 3.1 years ago by John12k
4

I have the rule of 3 that ties in with your observation:  making a program useful for others takes three times more effort than making a program that is useful to me

ADD REPLYlink written 3.1 years ago by Istvan Albert ♦♦ 74k
1
gravatar for Ram
3.1 years ago by
Ram12k
New York
Ram12k wrote:

Personally, I have never felt this difficulty. I have quite a bit of XP installing locally as well as installing in a user-specific fashion on HPC. Honestly, if you give me a .tgz or if it is available as a module in apt-get or on homebrew, it doesn't get simpler.

ADD COMMENTlink written 3.1 years ago by Ram12k
3

On shared or high security systems, we don't get to "sudo apt-get" or "yum" or even homebrew. Any instructions that start with 'sudo' are useless.

ADD REPLYlink written 3.1 years ago by karl.stamm3.2k

I know. Which is why the first option: .tgz :-)

I mention apt-get for cloud VMs. homebrew (which does not need sudo, BTW) for local Mac. .tgz is universal.

And ENV variables are set in ~/.bash_profile anyway!

ADD REPLYlink modified 3.1 years ago • written 3.1 years ago by Ram12k
1

Well, this really depends on what is in .tgz. If it is a precompiled statically linked binary in .tgz, I couldn't be happier. If it is source code claimed to be installed "easily and seamlessly via cmake", it might turn out to be a nightmare.

ADD REPLYlink written 3.1 years ago by lh330k
1
gravatar for SES
3.1 years ago by
SES8.0k
Vancouver, BC
SES8.0k wrote:

I think the most important things developers should consider is the target OS (and version), the programming language (and version), and the libraries (and the versions) you use when building a tool. All of these challenges can be thoroughly tested using continuous integration, but to take full of advantage of those features we would need to write tests to cover all the methods and statements, and to write good tests we need documentation. Therefore, I think it would take a lot of work to get most academic software to the point of using continuous integration because I can't think of many toolkits that actually have a test suite and documentation (not just a readthedocs webpage because that is not helpful at the command line or for developing).

Getting there is much easier than you might think though, every Perl package I build has tests to check syntax, formatting, and check that every method is documented, so I can focus on more important things. Of course, a prerequisite to building a system like this is to use version control because you can't maintain a script/package from just your computer and expect it to work in another environment.

You asked for examples, so I will share some features of Perl that I wish more people in bioinformatics knew about. Since Perl doesn't consider whitespace significant you can use tools to pack your dependencies into a single file, so every script/app is dependency-free. This also works with C libraries, you can compile a Perl script and the C libraries it uses into a single binary. Also, there are some fantastic admin-free tools like perlbrew and cpanminus. Both of those tools are examples of my previous point, they can be installed with a single curl command. Therefore, no one should ever have to use "sudo" to do anything with Perl. From a developer's perspective, you can easily manage many versions of Perl with Perlbrew or plenv, and use them all to compile a script with a single command (though I use CI to do that for me). With cpanminus, I use a cpanfile (which is a dependency spec just like a gemfile for Ruby), so I can pin my installation to ensure one library version doesn't break the build. There other tools for managing dependencies also, like Carton and Stratopan, and those allow you to ensure that "version 0.07" always ships with the same libraries, not newer ones that the package manager decides to install.

I don't have a general solution for shipping core dependencies like graphics libraries, I wish there was one. You would think that people who use these in industry have clever methods for updating and deploying these tools rapidly (maybe Docker, as you said).

ADD COMMENTlink modified 3.1 years ago • written 3.1 years ago by SES8.0k

I like the idea of packing dependencies into a single file. I hope python has a similar tool and wish more developers could ship their python-only/perl-only package in one file.

ADD REPLYlink written 3.1 years ago by lh330k

there are a few tools that aim to do that (freeze to a single executable) with varying degrees of success. but i don't like the idea of forcing the inclusion of dependencies--into a single-file. that's like the old-school perl days. certainly docker is a better solution.

ADD REPLYlink written 3.1 years ago by brentp22k

What do you think is old school about including dependencies? There are a a lot environments where installing deps, or upgrading, is not an option. This approach is not different than shipping a distribution with your classes defined in separate modules, expect in this case your classes are in one file. All development happens like normal, but for deployment the package is "packed." There is no worry of incompatibilities because everything has been tested before you ship, and the real plus is your app is not going to mess with the system perl when people use "sudo" to install it with the vendor-distributed perl. In that case, only one file (plus the docs) will be installed.

ADD REPLYlink written 3.1 years ago by SES8.0k

just because dependency management isn't a solved or easy problem doesn't mean I should have to include all dependencies in the current tree.

ADD REPLYlink written 3.1 years ago by brentp22k

I wouldn't recommend that as a general solution either and that approach is not what I was referring to (I'll try to elaborate here). As lh3 and I mentioned, you would develop a package just like normal, usually with all your classes, roles, etc. in separate files. That is where all the coding and testing takes place. 

For reference, when class method X is called that method name is looked up in the symbol table, and if you already loaded the class containing method X (or defined it in the current file) then the method name should exist in the symbol table. The approaches I was referring to inspect the symbol table once your application is loaded, and the required methods are packed into a single file. I think of the whole process as similar to compiling a C program, and in either case you wouldn't be concerned with reading or editing a compiled binary, that just needs to be understood by the computer.

I mentioned this approach because it appears to be a general solution that is common in other fields. I think we should steal approaches from as many fields and programming languages as possible, if it can make our lives easier.

ADD REPLYlink written 3.1 years ago by SES8.0k

The source code can still be spread into multiple files. The single-file version is just like a precompiled binary where readability is not a huge concern. I much prefer the old-school perl days, tbh. I would rather write 1000-line scripts than ask my users to install. Docker is nice conceptually, but it is too early to adopt it when none of HPC I use is compatible with it. Developers should meet users' settings, but not ask users to match their working environment.

ADD REPLYlink written 3.1 years ago by lh330k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1377 users visited in the last hour