Question: Issue running deeptools on cluster via snakemake
gravatar for camerond
16 months ago by
camerond150 wrote:

I have snakemake Snakefile for an ATAC-seq pipeline that I have been using for ages and I'm trying add a Snakerule to create bigwig files from bam files. I'm using the bamCoverage program from the Deeptools package to do this.

The snakemake -np output works fine, and the code snakemake spits out for each iteration of bamCoverage works fine if I run it manually, i.e. not via Snakemake:

bamCoverage -b bam_files/ATAC24_3_fetalMG.srtd.noMit.bam -o \
big_wig_files/ \
--outFileFormat bigwig -p 6 --ignoreDuplicates --normalizeUsing RPKM \
--blackListFileName /home/c1477909/blacklist_files/hg19.blacklist.bed \
--binSize 10 --extendReads

The snakemake error I get is this:

CalledProcessError in line 102 of ATAC_24to31_foetal_hMG_May19/Snakefile:
Command ' set -euo pipefail;  {code posted above is here} 
' returned non-zero exit status 127.
File "/c8000xd3/big-c1477909/foetal_hMG_analysis/ATAC_24to31_foetal_hMG_May19/Snakefile" 
line 102, in __rule_deeptools_make_bigwigs
File "/home/.conda/envs/snakemake/lib/python3.6/concurrent/futures/", line 56, in run
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message

This seems to be cause by the script, I've had a look at this but not 100% what is does. I assume it has something to do with setting cores/threads for the job, and have messed around with changing the thread setting within the snakerule and the actual code but keep getting the same error. The line 102 that the error refers to in the Snakemake script is the last line of the rule shown below.

This is the code for the snakerule:

rule deeptools_make_bigwigs:
threads: 6
    "bamCoverage -b {input} -o {output} --outFileFormat bigwig "
    "-p 6 --ignoreDuplicates --normalizeUsing RPKM "
    "--blackListFileName /home/blacklist_files/hg19.blacklist.bed "
    "--binSize 10 --extendReads"

And I use a cluster_config file to individually alter the number of cores/threads set for each job sent to the cluster:

    num_cores: 1
    maxvmem: 5G
    num_cores: 1
    maxvmem: 8G
    num_cores: 8
    maxvmem: 4G
    num_cores: 8
    maxvmem: 6G
    num_cores: 6
    maxvmem: 4G
    num_cores: 6
    maxvmem: 5G
    num_cores: 6
    maxvmem: 4G

I run everything to do with this script in a conda virtual environment and used conda to install all my packages, which all appear to be compatible. I have a feeling this is something simple but I'm just not seeing what it is.

Any suggestions would be greatly appreciated.

snakemake deeptools • 692 views
ADD COMMENTlink modified 16 months ago by russhh5.5k • written 16 months ago by camerond150

Does whatever scheduler you're using export paths and other environment variables? Is deepTools in the same environment as snakemake?

ADD REPLYlink written 16 months ago by Devon Ryan96k

Yes and yes. All the packages I use for this are in a specific environment for this script. The environment the script is in when it is sent to the scheduler is taken as the default environment with variables etc. I have about 12 separate rules using similar path/environmental variables, so assumed it was an issue with how I was assigning cores rather than an issue with deeptools/snakemake.

ADD REPLYlink written 16 months ago by camerond150
gravatar for russhh
16 months ago by
UK, U. Glasgow
russhh5.5k wrote:

The most important part of that error message is this

Command ' set -euo pipefail;  {code posted above is here} ' returned non-zero exit status 127.

Can I suggest that you capture stderr to a logfile when you run bamCoverage. Then when your rule fails, have a look in the corresponding log file

Currently working on a deeptools/macs2 based chipseq snakemake workflow. My bamCoverage rule looks like this at the moment - and contains details for how to capture stderr in a log file.

rule single_sample_bigwig:
        --- Make a coverage bigWig file for a single sample

        input: {input}
        output: {output}
        params: {params}

        bam = "data/job/downsample/{sequencing_sample_id}.bam",
        bai = "data/job/downsample/{sequencing_sample_id}.bai"






        bamCoverage \
          --bam {input.bam} \
          --outFileName {output} \
          {params} \
          --numberOfProcessors {threads} \
          2> {log}

From a code-review perspective, you've hard-coded -p 6 in your shell call, and also in the threads value. You should really be passing the number of threads from threads into your shell call (so that, should you be running with less than 6 threads available, you can still push your workflow through; and so you don't duplicate encoding the number of threads used).

ADD COMMENTlink modified 16 months ago • written 16 months ago by russhh5.5k

Thanks. Re: the threads - I was messing about with this as even though I had threads: 6 written in the rule when I did snakemake -np the code was coming out saying threads: 1, which I found confusing. After what you said above I assume that's because I only have 1 thread available in my home location.

The issue was actually a typo in the end! When I added the 2> {log} as you suggested, it turned out the code should have read --normalizeUsingRPKM with no space between ...Using and RPKM. It seems to be running OK now.

ADD REPLYlink written 16 months ago by camerond150

That wasn't a typo. That's due to an API redesign. --normalizeUsingRPKM would have been correct for deeptools < 3.0. You actually should be using --normalizeUsing RPKM if you're using a recent version of deeptools

ADD REPLYlink modified 16 months ago • written 16 months ago by russhh5.5k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1888 users visited in the last hour