Question

how to run bioinformatics tools via docker

2

Entering edit mode

6.0 years ago

genya35 ▴ 40

Hello,

Could someone please suggest a tutorial or explain how to run bioinformatics tools via docker. I have docker installed on linux ubuntu but struggling to understand what to do after the image is pulled from gitHub.

Thanks

RNA-Seq • 5.2k views

ADD COMMENT • link 5.9 years ago by genya35 ▴ 40

0

Entering edit mode

The problem is with the docker file supplied at https://github.com/ndaniel/fusioncatcher The image doesn't load correctly. Could you please try pulling the image to see if you get the same error messages.

ADD REPLY • link 5.9 years ago by genya35 ▴ 40

1

Entering edit mode

Due to disk space limitations, I did not include the databases in the image but otherwise the build completed successfully. If you would like to test, the docker image I built is available on Docker hub.

ADD REPLY • link 5.9 years ago by vimalkvn ▴ 320

0

Entering edit mode

Thank you very much!!!

The image works great. Also, I've tried to run fusioncatcher-batch.py from the image to compare tumor samples to matching normal samples but it didn't find fusioncatcher-batch.py. Is fusioncatcher-batch.py included in the dockerfile?

Thanks

I've tried to build the databases:

$ docker run -v /data:/data --rm vimalkvn/fusioncatcher-nodb fusioncatcher-build -g homo_sapiens -o /data/FusionCatcher and encountered errors below. I'm not sure if this has to do with docker file or not?

<p>path-mdl-2:/data$ docker run -v /data:/data --rm
vimalkvn/fusioncatcher-nodb fusioncatcher-build -g homo_sapiens -o
/data/FusionCatcher
WARNING: Cannot restart automatically because the previous log file
'/data/FusionCatcher/fusioncatcher-build.log' cannot be found!
The workflow will be restarted from the beginning with step 1!
Python version: 2.7.6 (default, Nov 23 2017, 15:49:48)
[GCC 4.8.4]
Python executable: /usr/bin/python
1.63
Downloading the genome of organism 'homo_sapiens' from Ensembl!
FTP Error = [Errno -2] Name or service not known
Downloading the genome of organism 'homo_sapiens' from Ensembl!
FTP Error = [Errno -2] Name or service not known</p>

<h2>Log of the pipeline:</h2>

<hr>

<p>Starting execution with step 1.
////////////////////////////////////////////////////////////
////////////////////
  Running: step = 1   Time: 13:43   Date: 2018-05-09 (elapsed time:
0d:0h:0m)
\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\
\\\\\\\\\\
==&gt; Current working directory: '/opt'</p>

<h2>python_version.py</h2>

<hr>

<h2>+--&gt;EXECUTING...</h2>

<hr>

<p>==&gt; Execution time: 0 day(s), 0 hour(s), 0 minute(s), and 0 second(s)
////////////////////////////////////////////////////////////
////////////////////
  Running: step = 2   Time: 13:43   Date: 2018-05-09 (elapsed time:
0d:0h:0m)
\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\
\\\\\\\\\\
==&gt; Current working directory: '/opt'</p>

<h2>biopython_version.py</h2>

<hr>

<h2>+--&gt;EXECUTING...</h2>

<hr>

<p>==&gt; Execution time: 0 day(s), 0 hour(s), 0 minute(s), and 0 second(s)
////////////////////////////////////////////////////////////
////////////////////
  Running: step = 3   Time: 13:43   Date: 2018-05-09 (elapsed time:
0d:0h:0m)
\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\
\\\\\\\\\\
==&gt; Current working directory: '/opt'
printf \</p>

<h2>"homo_sapiens"  \</h2>

<hr>

<h2>+--&gt;EXECUTING...</h2>

<hr>

<p>==&gt; Execution time: 0 day(s), 0 hour(s), 0 minute(s), and 0 second(s)
////////////////////////////////////////////////////////////
////////////////////
  Running: step = 4   Time: 13:43   Date: 2018-05-09 (elapsed time:
0d:0h:0m)
\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\
\\\\\\\\\\
==&gt; Current working directory: '/opt'
get_genome.py \
--organism homo_sapiens \
--server ftp.ensembl.org \</p>

<h2>--output /data/FusionCatcher/</h2>

<hr>

<p>+--&gt;EXECUTING...</p>

<h2>ERROR: Workflow execution failed at step 4 while executing:</h2>

<p>get_genome.py \
   --organism homo_sapiens \
   --server ftp.ensembl.org \</p>

<h2>   --output /data/FusionCatcher/</h2>

<p>Executing second time the same step/command in order to capture error
messages (i.e. STDERR)...</p>

ADD REPLY • link 5.9 years ago by genya35 ▴ 40

0

Entering edit mode

No other changes have been made to the Dockerfile. You can find it here.

The FTP errors indicate a connectivity issue (probably ports blocked in the firewall) from your computer to ftp.ensembl.org. This could also be the reason why the original docker build failed too.

As for fusioncatcher-batch.py, it appears not all scripts are in the PATH. You can run it using the full path to the script like this:

docker run --rm vimalkvn/fusioncatcher-nodb /opt/fusioncatcher/v1.00/bin/fusioncatcher-batch.py

ADD REPLY • link 5.9 years ago by vimalkvn ▴ 320

0

Entering edit mode

I'm trying to run fusioncatcher in somatic mode but it's not working:

`docker run --rm -v /home/genya7/mst/rna_seq/:/data vimalkvn/fusioncatcher-nodb /opt/fusioncatcher/v1.00/bin/fusioncatcher-batch.py -i /data/tumor/ -n /data/normal/ -o /data/output_90530/`;

It's working in this mode:

my $test_fusion_catcher = `docker run --rm -v /home/genya7/mst/rna_seq/:/data vimalkvn/fusioncatcher-nodb fusioncatcher -d /data/human_data/human_v90/ -i /data/tumor/ -o /data/output_90530/`;

Thank you very much for your help

ADD REPLY • link 5.7 years ago by genya35 ▴ 40

score 2 · Answer 1 · 2018-04-27

2

Entering edit mode

6.0 years ago

Nicolas Rosewick 10k

Take a look at docker exec command : https://docs.docker.com/engine/reference/commandline/exec/

Let's say you have a docker image where bwa is installed and is nammed : dockerBWA

to execute bwa first create the container with the docker run command the execute the command using docker exec

docker run dockerBWA
docker exec -i -t dockerBWA bwa

ADD COMMENT • link 6.0 years ago by Nicolas Rosewick 10k

0

Entering edit mode

i thought the whole point of using docker is that nothing needs to be installed but you are suggesting of adding docker image where bwa is already installed. I'm not understanding this.

ADD REPLY • link 6.0 years ago by genya35 ▴ 40

1

Entering edit mode

Docker provides a way to load and run pre-built software from containers. You may want to read an introduction to some of the terminology and then this answer may be a little clearer.

ADD REPLY • link 6.0 years ago by Alex Reynolds 35k

0

Entering edit mode

The answer is clear but it doesn't seem to work with the tool (FusionCatcher) I would like to run. I've pulled the image from here: https://hub.docker.com/r/cgrlab/fusioncatcher/ created a container but the exec command does not work at all. If you could suggest how to run this tool i would really appropriate it. Thanks

ADD REPLY • link 6.0 years ago by genya35 ▴ 40

0

Entering edit mode

Running a container depends on how the image is built. Usually calling docker run image_name would directly run the program. In this case, you can run the container like this:

docker run --rm cgrlab/fusioncatcher /opt/fusioncatcher/bin/fusioncatcher

You will also need to share folders containing your input, output data and the databases with the container. You can do it using the -v option for example to share data in your current directory:

docker run --rm -v data:/data cgrlab/fusioncatcher /opt/fusioncatcher/bin/fusioncatcher

Then fusioncatcher will be able to read files from /data. -v can be used multiple times to share different folders.

The --rm just removes the container after it has run to save space.

ADD REPLY • link 6.0 years ago by vimalkvn ▴ 320

0

Entering edit mode

The wrong version of Bowtie was in that container. I've attempted to build an image using Dockerfile directly for GitHub. https://github.com/ndaniel/fusioncatcher

I've copied the content of the Dockerfile into a notepad and executed:

$ docker build - < /data/fusion_catcher_docker.txt

It download the image with few errors. I've tried running the image as follows:

docker run --rm -v /data:/data ab38bce870fc \
     opt/fusioncatcher/bin/fusioncatcher \
    -d /data/human_data/human_v90/ \
    -i /data/tumor/ \
    -o /data/output/

I get the following error:

docker: Error response from daemon: OCI runtime create failed: container_linux.go:348: starting container process caused "exec: \"opt/fusioncatcher/bin/fusioncatcher\": stat opt/fusioncatcher/bin/fusioncatcher: no such file or directory": unknown.
ERRO[0000] error waiting for container: context canceled

Please suggest how to proceed.

Thank you

ADD REPLY • link 6.0 years ago by genya35 ▴ 40

0

Entering edit mode

From the error message:

stat opt/fusioncatcher/bin/fusioncatcher: no such file or directory

There is a typo in the path supplied to docker. It should be /opt/fusioncatcher/bin/fusioncatcher

ADD REPLY • link 5.9 years ago by vimalkvn ▴ 320