Tutorial:Install required dependencies for GATK4 on remote server without root privilege
1
1
Entering edit mode
9 months ago

Motivation


I start this tutorial because I spend too many hours trying to install the dependencies of gatk4 on remote server without root privilege.

GATK is the most common tools for calling variants. Link

However, documentation of the tools and support for installation on other platform is limited. It's hard to install R and python dependencies

  • If you have access to docker on your remote server, GREAT, you don't need this. Just follow the instruction on using the container from gatk. Link
  • If you have root privilege on your remote server, GREAT, you don't need this. Some of the steps here will be overly complicated. Just use conda on your remote server.
  • If you (like me) do not have access to docker nor root privilege of the remote server. I hope this tutorial will solve some problems that you encounter and save you many hours of googling.

Principles


You will need to know how to use docker and its container before using this tutorial. You will also need to know how to use conda to manage virtual environment.

I will create a docker container with gatk dependencies using conda as package manager. Then export that virtual environment and upload it to the remote server (where you do not have root privilege to install dependencies)

Note: The container that broadinsitute released, which can be accessed with: docker pull broadinstitute/gatk:latest sadly cannot be used to install conda pack and export the environment outside of the container.

Step 1


Download latest version of GATK from github to your working directory. At this moment of writing: gatk-4.2.5.0

cd /path/to/dev/dir/
wget https://github.com/broadinstitute/gatk/releases/download/4.2.5.0/gatk-4.2.5.0.zip
mkdir gatk-4.2.5.0
unzip gatk-4.2.5.0.zip -d gatk-4.2.5.0
cd gatk-4.2.5.0
# run some command to create requirement text files for step 2
# this is some work-around that I try because I cannot run `conda env create -f gatkcondaenv.yml` directly
printf "name: gatk\nchannels:\n- conda-forge\n- defaults\ndependencies:\n- python=3.6.10\n- ipython\n" > environment1.yml
cat gatkcondaenv.yml | grep "^-" | sed 's/- //; s/ .*//; $d; 1,3d' > requirement.txt
tail -n 2 gatkcondaenv.yml | sed '1s,^,dependencies:\n,' > environment2.yml

Step 2


I use docker to create an image of ubuntu-18.04.4 (the same OS as the container provided by gatk) >> Install miniconda3 >> Install other R and Python dependencies of gatk4.

Sadly, the instruction from broadinstitute to install dependencies: conda env create -n gatk -f gatkcondaenv.yml does not work for me. The conda process cancel while "solving environment" and no new environment are created

So I tried to find a work around. First, create a new environment with python=3.6.10 as the main python. Then, activate that environment and using conda install --file requirement.txt to install other dependencies of gatk (the requirement.txt is converted from the gatkcondaenv.yml file provided by gatk)

The following code is the Dockerfile that I used to create my container using docker build -t your_account/gatk:4.2 -f Dockerfile .

the . at the end is your build context (working directory that you are building the images from). After that, you should have a docker image named your_account/gatk:4.2

# docker pull ubuntu:18.04
# docker build -t your_account/gatk:4.2 -f Dockerfile .
FROM ubuntu:18.04
WORKDIR /opt/
COPY gatkcondaenv.yml ./
COPY gatkPythonPackageArchive.zip  ./
COPY environment1.yml ./
COPY environment2.yml  ./
COPY requirement.txt ./
# system packages
RUN apt-get update && apt-get install -yq curl wget jq vim less nano && \
    curl -LO https://repo.anaconda.com/miniconda/Miniconda3-py39_4.10.3-Linux-x86_64.sh && \
    bash Miniconda3-py39_4.10.3-Linux-x86_64.sh -p /miniconda -b && \
    rm Miniconda3-py39_4.10.3-Linux-x86_64.sh
# create conda env for gatk first
ENV PATH=/miniconda/bin:${PATH}
RUN conda update -y conda && conda init && \
    conda env create -f environment1.yml
# install gatk dependencies
SHELL ["conda", "run", "-n", "gatk", "/bin/bash", "-c"]
RUN conda install -y -n gatk --file requirement.txt && \
    conda env update -n gatk --file environment2.yml && \
    conda install -y -n base -c conda-forge conda-pack

Or you should run the code inside your container interactively yourself to create a container with working dependencies and commit it. Below is the stdout of my docker build with the Dockerfile that I write here.

$ docker build -t your_account/gatk:4.2 -f Dockerfile .
[+] Building 730.5s (15/15) FINISHED                                                                                   
 => [internal] load build definition from Dockerfile                                                              0.0s
...
...
 => => exporting layers                                                                                          68.2s 
 => => writing image sha256:fc50be4ed681277ea2b7927b622b97c6b4e9c6eb9a4d73224183317a4efe3ef9                      0.0s 
 => => naming to docker.io/your_account/gatk:4.2                                                                  0.0s

Step 3


Hopefully, the required dependencies is installed and you can test that it work with python -c "import vqsr_cnn". The stdout will start with Using TensorFlow backend.

After that, run the container and mount it to a volume so that you can write the environment outside of your container. Then upload it to your remote server and unpack it

dir="/path/to/output/"
docker run --rm -v ${dir}:/mnt/ -it your_account/gatk:4.2
# inside the container, run conda pack
conda pack -n gatk -o /mnt/gatk.tar.gz
# the gatk.tar.gz file will be output to your output directory
# upload it to your remote server, for example with rsync
source="gatk.tar.gz"
dest="username@remote_host_ip:/path/for/env/dir/"
rsync -aP ${source} ${dest}
# on your remote server
cd /path/for/env/dir/
mkdir -p gatk
tar -xzf gatk.tar.gz -C gatk
# Activate the environment. This adds `my_env/bin` to your path
source /path/for/env/dir/gatk/bin/activate
# clean up prefix so that you can run R and python without problem
conda-unpack
gatk • 1.6k views
ADD COMMENT
1
Entering edit mode

you don't need to be root to run gatk.

cd gatk-4.2.5.0
./gatk
ADD REPLY
0
Entering edit mode

sorry, I am updating the post. This is meant to install dependencies to run some gatk tools like vqsr: python -c "import vqsr_cnn"

ADD REPLY
1
Entering edit mode

one can use conda. There is a yaml file provided by gatk: conda env create -f gatkcondaenv.yml

ADD REPLY
0
Entering edit mode

I tried that command too. But for my case, I start with new container running ubuntu-18.04 >> install miniconda >> download gatk4 >> install dependencies with conda env create -f gatkcondaenv.yml

The command failed at solving environment:

$ conda env create -n gatk4 -f gatkcondaenv.yml
Collecting package metadata (repodata.json): done
Solving environment: / (base) root@a8edc21b1512:/mnt/Tools/gatk-4.2.5.0#
ADD REPLY
0
Entering edit mode

You could simply use any of the Docker base images that have conda (or mamba) out of the box, such as:

As @dariober says below, once you have that running you could simply install gatk it via conda itself. Also, what is wrong with the official gatk container from the Broad? https://hub.docker.com/r/broadinstitute/gatk/

Anyway, thanks for the effort!

ADD REPLY
0
Entering edit mode

Oh, nothing wrong with it. If you have access to docker on your server. You can just use it and this tutorial will be very redundant to you

I write this for the specific case that you do not have access to docker on your remote server.

Admittedly, this is for a very specific situation. But I just want to share it somewhere because I spend a lot of time trying to resolve this. Haha

ADD REPLY
0
Entering edit mode

Hello, I'm facing similar situation, solving environment fails when I try the conda create environment with yml file (docker pull gives me Got permission denied while trying to connect to the Docker daemon socket at unix:///var/run/docker.sock). However, when I tried to follow your protocol on creating a new docker via docker build -t your_account/gatk:4.2 -f Dockerfile . it gives me the same permission denied error. I wonder if you have encountered similar things...?

Thank you!

ADD REPLY
0
Entering edit mode
9 months ago

I haven't fully tested the instructions below but it seems to me that you are making things more complicated than necessary. There shouldn't be any need to resort to docker.

Install conda if not already available. Follow the instructions on the prompt - answering yes to most questions is ok for most users

curl -O https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
sh Miniconda3-latest-Linux-x86_64.sh

Configure conda for using bioconda

conda config --add channels defaults
conda config --add channels bioconda
conda config --add channels conda-forge

The above needs to be done only once and it's unrelated to GATK.


Create an environment for gatk4 or for your project (not necessary but preferable to keep projects contained)

conda create -n my-gatk4-env

Activate the environment and install gatk4 (choose your version)

conda activate my-gatk4-env
conda install 'gatk4=4.2.5'
ADD COMMENT
0
Entering edit mode

Maybe I did not make myself clear enough in the title and motivation.

The problem is not installing gatk4.

The problem is trying to have enough python dependencies and R dependecies to run gatk AnalyzeCovariate and python -c "import vqsr_cnn". The tools that required dependencies that come with gatk

I will edit the title and motivation accordingly. Also, as stated above, this is very redundant if you have docker access or root privilege on the remote server

ADD REPLY
1
Entering edit mode

conda does not require root privileges and installing a particular package will come with all the dependencies

ADD REPLY
0
Entering edit mode

At least for my case, I don't think I can create new environment with conda command on the remote server. The available environment on the server is saved on a location that I cannot edit, and I cannot run conda install gatk4 on the remote server.

ADD REPLY
2
Entering edit mode

I don't think I can create new environment with conda command on the remote server

Have you tried issuing conda create -n my-env? What error do you get?

It may be that you are using a system-wide installation of conda which does not allow regular users to create and install programs (and which defeats the point conda). What output do you get from which conda? If you are using a system-wide conda, then install your own as per instructions.

ADD REPLY
0
Entering edit mode

I can create conda env and I have done so multiple times for installation of other softwares. However, when I tried to create a new environment fitting GATK4 with the yml script (and the command line they suggested), it tells me there are contradictions and the installation fails. (Neither can I do conda install gatk4 cuz it also get problematic in solving environment step). I'm not sure what the problem is here...

ADD REPLY
1
Entering edit mode

Because you're trying to install a package into the base conda environment, which is generally unsafe. The base conda environment in a system wide installation is shared between users, and most of the time, blocked to prevent package installation without super user privileges. You're still able to create a user-specific virtual environment (which will be written into your $HOME path).

ADD REPLY

Login before adding your answer.

Traffic: 886 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6