How define Nextflow parameters (e.g. time/cpu/memory) in config or script?
2
3
Entering edit mode
23 months ago
Eliveri ▴ 350

I have a workflow for processing reads. Some reads are larger files and will require more memory/time.

I am currently defining time, cpu, memory in the script.nf like so. I am choosing the values rather broadly (guessing).

Is it better to define time/cpus/memory in the nextflow.config file? Is nextflow better able to estimate the necessary resources? Is it necessary to define time for longer processes?

//trimmomatic read trimming
process TRIM {
    ... 
    time '8h'
    cpus 12
    penv 'smp' 
    memory '128 GB'

    script:
    """
...
Nextflow • 3.2k views
ADD COMMENT
5
Entering edit mode
23 months ago

Is it better to define time/cpus/memory in the nextflow.config file?

Nextflow provides process selectors for the config file. This means that it may be easier to set process directives, including resources requests, in the nextflow.config file. You could use the label directive in your processes (in the script.nf file) to label them as needing a lot of resources, medium or small. Then, in the nextflow.config file, you would have something like:

process {
  cpus = 16
  queue = 'long'
  withLabel: big_mem {
    memory = 64.GB
  }
  withLabel: small_mem {
    memory = 2.GB
  }
}

Besides, using nextflow.config for this makes your pipeline easier to port to other users/infrastructures. They just have to change the nextflow.config file instead of looking for the directives in your scripts. In a simple case, you could only have one script.nf but in other scenarios, there could be plenty of script files.

Is nextflow better able to estimate the necessary resources?

Nextflow doesn't do any resource estimation, but Nextflow Tower does. It will estimate better resource configurations based on previous runs and runs from other users of the same pipeline.

Is it necessary to define time for longer processes?

Well, it's up to you. If you think a process shouldn't run for longer than N minutes, and you want Nextflow to abort the task if it happens, set a time for it.

ADD COMMENT
2
Entering edit mode
23 months ago

is it better to define time/cpus/memory in the nextflow.config file?

yes if you want to share your workflow. The nodes/cluster might have different limits than your cluster.

ADD COMMENT

Login before adding your answer.

Traffic: 1214 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6