Nextflow: Multiple jobs merged into one.
2
2
Entering edit mode
9 months ago
Alexis ▴ 40

Hi,

I am very new to nextflow (I used to work with snakemake in the past). I am trying to create a dummy workflow for understanding the basics of the _pipeline creation_ and the first step at designing my own.

For this I want to unzip my fastq files and create 2 dummy repports from the read length of the R2 read. I have 2 scripts for now, main_qc.nf (including the workflow) and modules_qc.nf (with all processes) shown below:

#!/usr/bin/env nextflow
nextflow.enable.dsl=2

// Include in workflow
include {
GUNZIP_FASTQ;
} from "./modules_qc.nf"

// Initial parameters
bashdir="/shs"

params.sample = "s1"
params.fastq = "$datadir/fastqs/${params.sample}_2.fastq.gz"

workflow {
// Create
Channel
.fromFile(params.fastq)
.set { chFastqFile }

Channel
.of(params.sample)
.set { samples }

GUNZIP_FASTQ(chFastqFile)
}

#!/usr/bin/env nextflow
nextflow.enable.dsl=2

// Unzipping files
process GUNZIP_FASTQ {

input:
path target

output:
path "${target.simpleName}.fastq" script: """ gunzip -d -c${target} > ${target.simpleName}.fastq """ } // Export read length to file process GET_READ_LENGTH { input: val sample_id path fastq output: path "${sample_id}.readLength.txt"

script:
"""
bash ./shs/readLength.sh ${fastq}${sample_id}.readLength.txt
"""
}


I want to first run the GUNZIP process on all fastqs for all sample and then create one dummy repports per sample. GUNZIP processes have to run twice ad much as the other processes.

How should I proced?

Thank you very much

nextflow pipeline fastq • 1.0k views
0
Entering edit mode

Your channel that goes in the GUNZIP_FASTQ is actually a queue channel, GUNZIP_FASTQ will be executed as long as you have item in said channel: https://www.nextflow.io/docs/latest/channel.html#channels

0
Entering edit mode

Your channel chFastqFile receives only one file : $datadir/fastqs/${params.sample}_2.fastq.gz. You may want to include a glob * operator, use fromPath (afaicr fromFile is deprecated) or adopt Pierre's full revamp.

0
Entering edit mode

Thank you very much for all your insight. With all of this I was able to create what I hoped for!

4
Entering edit mode
9 months ago

I would write it the following way (not tested)

#!/usr/bin/env nextflow
nextflow.enable.dsl=2

// input is a tsv file with sample and path/to/fastq
params.sample_fastq = ""

workflow {
gunzip_ch = GUNZIP_FASTQ(sample_fastq_ch)
}

process GUNZIP_FASTQ {
input:
tuple val(sample),val(fq)
output:
path("${sample}.fastq"),emit:out script: """ gunzip-c${fq} > ${sample}.fastq """ } process GET_READ_LENGTH { input: tuple val(sample),val(fq) output: path("${sample_id}.readLength.txt"),emit:out
script:
"""
bash /full/path/to/shs/readLength.sh ${fq} "${sample}.readLength.txt"
"""
}

process ZIPALL {
input:
val(L)

output:
path("output.zip")
script:
"""
zip -j output.zip ${L.join(" ")} """ }  a side note : whatever is "readLength.sh", you shouldn't use a software that requires you to gunzip a fastq... ADD COMMENT 1 Entering edit mode 9 months ago Maxime Garcia ▴ 230 Your channel that is going into the GUNZIP_FASTQ is actually a queue channel. Your process will be execute depending on how many items you have in said channel. cf https://www.nextflow.io/docs/latest/channel.html#channels ADD COMMENT 1 Entering edit mode Sorry, the anti-spambot got triggered on this somehow, restored. ADD REPLY 0 Entering edit mode Hi, Thank you very much for your quick answer. I did manage to run the GUNZIP_FASTQ process multiple times with a wildcard. e.g: params.fastq = "$datadir/fastqs/\${params.sample}_*.fastq.gz" but at the end I still only launch one sample (the last one) with the GET_READ_LENGTH.

I don't understand how the queue works from one task to another.

1
Entering edit mode

I'm guessing issue might be due to the fact that your samples channel has only one item, and as it's a queue channel, so it'll be executed only until no item left too. I'm assuming you want your samples channel to be able to have different values, and you want to actually combine your samples and your fastq channel into just one, with a tuple sample, fastq, so I'd look into combining operators (https://www.nextflow.io/docs/latest/operator.html#combining-operators)

I'd really recommend having a look at the tutorials, they've been updated recently, and you'll get tons of information and nice examples: https://seqera.io/nextflow/learn/