Question

Issue running deeptools on cluster via snakemake

1

Entering edit mode

6.2 years ago

camerond ▴ 190

I have snakemake Snakefile for an ATAC-seq pipeline that I have been using for ages and I'm trying add a Snakerule to create bigwig files from bam files. I'm using the bamCoverage program from the Deeptools package to do this.

The snakemake -np output works fine, and the code snakemake spits out for each iteration of bamCoverage works fine if I run it manually, i.e. not via Snakemake:

bamCoverage -b bam_files/ATAC24_3_fetalMG.srtd.noMit.bam -o \
big_wig_files/ATAC24_3_fetalMG.srtd.noMit.RPKM.bin10.bw \
--outFileFormat bigwig -p 6 --ignoreDuplicates --normalizeUsing RPKM \
--blackListFileName /home/c1477909/blacklist_files/hg19.blacklist.bed \
--binSize 10 --extendReads

The snakemake error I get is this:

RuleException:
CalledProcessError in line 102 of ATAC_24to31_foetal_hMG_May19/Snakefile:
Command ' set -euo pipefail;  {code posted above is here} 
' returned non-zero exit status 127.
File "/c8000xd3/big-c1477909/foetal_hMG_analysis/ATAC_24to31_foetal_hMG_May19/Snakefile" 
line 102, in __rule_deeptools_make_bigwigs
File "/home/.conda/envs/snakemake/lib/python3.6/concurrent/futures/thread.py", line 56, in run
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message

This seems to be cause by the thread.py script, I've had a look at this but not 100% what is does. I assume it has something to do with setting cores/threads for the job, and have messed around with changing the thread setting within the snakerule and the actual code but keep getting the same error. The line 102 that the error refers to in the Snakemake script is the last line of the rule shown below.

This is the code for the snakerule:

rule deeptools_make_bigwigs:
input:
    rules.remove_mit_reads.output
output:
    "big_wig_files/{sample}.srtd.noMit.RPKM.bin10.bw"
threads: 6
log:
    "logs/deeptools_make_bigwigs/{sample}.log"
shell:
    "bamCoverage -b {input} -o {output} --outFileFormat bigwig "
    "-p 6 --ignoreDuplicates --normalizeUsing RPKM "
    "--blackListFileName /home/blacklist_files/hg19.blacklist.bed "
    "--binSize 10 --extendReads"

And I use a cluster_config file to individually alter the number of cores/threads set for each job sent to the cluster:

__default__:
    num_cores: 1
    maxvmem: 5G
fastqc:
    num_cores: 1
    maxvmem: 8G
bowtie2:
    num_cores: 8
    maxvmem: 4G
sort_bam:
    num_cores: 8
    maxvmem: 6G
deeptools_make_bigwigs:
    num_cores: 6
    maxvmem: 4G
homer_annotation:
    num_cores: 6
    maxvmem: 5G
homer_motif_analysis:
    num_cores: 6
    maxvmem: 4G

I run everything to do with this script in a conda virtual environment and used conda to install all my packages, which all appear to be compatible. I have a feeling this is something simple but I'm just not seeing what it is.

Any suggestions would be greatly appreciated.

Deeptools Snakemake • 2.5k views

ADD COMMENT • link updated 6.2 years ago by russhh 5.8k • written 6.2 years ago by camerond ▴ 190

0

Entering edit mode

Does whatever scheduler you're using export paths and other environment variables? Is deepTools in the same environment as snakemake?

ADD REPLY • link 6.2 years ago by Devon Ryan 105k

0

Entering edit mode

Yes and yes. All the packages I use for this are in a specific environment for this script. The environment the script is in when it is sent to the scheduler is taken as the default environment with variables etc. I have about 12 separate rules using similar path/environmental variables, so assumed it was an issue with how I was assigning cores rather than an issue with deeptools/snakemake.

ADD REPLY • link 6.2 years ago by camerond ▴ 190

score 3 · Accepted Answer · 2019-05-15

The most important part of that error message is this

Command ' set -euo pipefail;  {code posted above is here} ' returned non-zero exit status 127.

Can I suggest that you capture stderr to a logfile when you run bamCoverage. Then when your rule fails, have a look in the corresponding log file

Currently working on a deeptools/macs2 based chipseq snakemake workflow. My bamCoverage rule looks like this at the moment - and contains details for how to capture stderr in a log file.

rule single_sample_bigwig:
    message:
        """
        --- Make a coverage bigWig file for a single sample

        input: {input}
        output: {output}
        params: {params}
        """

    input:
        bam = "data/job/downsample/{sequencing_sample_id}.bam",
        bai = "data/job/downsample/{sequencing_sample_id}.bai"

    output:
        "data/job/single_sample_bigwig/{sequencing_sample_id}.bw"

    log:
        "logs/single_sample_bigwig/{sequencing_sample_id}.bw"

    conda:
        "../../envs/deeptools.yaml"

    params:
        program_params["deeptools"]["bamCoverage"]

    threads:
        4

    shell:
        """
        bamCoverage \
          --bam {input.bam} \
          --outFileName {output} \
          {params} \
          --numberOfProcessors {threads} \
          2> {log}
        """

From a code-review perspective, you've hard-coded -p 6 in your shell call, and also in the threads value. You should really be passing the number of threads from threads into your shell call (so that, should you be running with less than 6 threads available, you can still push your workflow through; and so you don't duplicate encoding the number of threads used).