How to create Dockerfile without copying large data input and Build image such that snakemake wokflow run as Entrypoint
1
0
Entering edit mode
14 months ago
majeedmj.ict ▴ 20

I have project folder structure like Below : which has size of more than 50 GB .

When i am creating Dockerfile such that Snakefile workflow which utilizes data from these folders , should run inside Docker and snakemake command should be entrypoint .

├── adapters.fa

├── Dockerfile

├── genome

│ ├──

├── genomeIndex

│ ├──

├── RawReads

│ ├── sample_1.fastq.gz

│ └── sample_2.fastq.gz

├── RNAindex

│ ├──

└── Snakefile

How should i create Dockerfile and build Image , where i dont want to copy some folders like RawReads , genome , genomeIndex but can link to snakemake rules . Such that when i run docker container on command should run whole snakemake workflow and create results folders .

Sample Dockerfile and Snakefile are shown below :

Snakefile :

rule starAlignment:
input:
    trimmed1="Trimmed/{id}_forward.fastq",
    trimmed2="Trimmed/{id}_reverse.fastq"
output:
    "starOut/{id}Unmapped.out.mate1",
    "starOut/{id}Unmapped.out.mate2",
    "starOut/{id}Aligned.sortedByCoord.out.bam",
    "starOut/{id}Log.final.out"
params:
    prefix="starOut/{id}"
threads: 20
shell:
    """
    STAR --runThreadN {threads} --genomeLoad LoadAndKeep --genomeDir genomeIndex --readFilesIn {input.trimmed1} {input.trimmed2} --outFilterIntronMotifs RemoveNoncanonical --outFileNamePrefix {params.prefix} --limitBAMsortRAM 15000000000 --quantMode GeneCounts --outSAMtype BAM SortedByCoordinate  --outReadsUnmapped Fastx
        """

Dockerfile :

FROM condaforge/mambaforge:22.9.0-3
RUN mamba install -c bioconda samtools bedtools fastqc multiqc trimmomatic bwa star picard rseqc subread snakemake 
WORKDIR /app

ENTRYPOINT [ "snakemake"]
Dockerfile Snakemake • 960 views
ADD COMMENT
0
Entering edit mode
14 months ago

Please don't add your data to the Docker container! Rather mount external directories when running snakemake:

snakemake --use-singularity --singularity-args "-B /datafolder/outside/container/:/datafolder/inside/container/"

Also, for all tools on bioconda, there are already strictly versioned containers available that you can just refer to in your Snakemake rule declaration. Effectively, there is no need for you to build a container image yourself unless you have written a custom software tool. Snakemake will ensure that all tools are run in the correct version and with the right parameters.

ADD COMMENT
0
Entering edit mode

Thank you for your response, I would like to know , whether ENTRYPOINT is correct in the above Dockerfile ? , if i mount externaly , whether Entrypoint will identify data folders ?

Because once i build and run container I am getting output as Job done , but No input No output . Can you please give a proper guidance . Thank you

ADD REPLY
0
Entering edit mode

There are ample blog posts about the difference between ENTRYPOINT and CMD in a Dockerfile.

However, I do not understand why you would like to manually create a Dockerfile, if you can make use of the capabilities of Snakemake to do so? Since your workflow only uses tools on conda, you can just run snakemake --containerize > Dockerfile. Done.

For performance reasons, I would, however, always opt to use separate containers for each tool, which you fortunately can do as easily with Snakemake.

ADD REPLY
0
Entering edit mode

Thank you for your response

ADD REPLY

Login before adding your answer.

Traffic: 1621 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6