Hi all,
I am trying to pass the output of a STAR genome index process to a STAR alignment process in Nextflow, but I keep running into tuple/variable issues. Here’s a minimal reproducible example of my setup.
STAR index process
process STAR {
publishDir params.outdir, mode: "copy"
label 'process_high'
input:
tuple path(reference), path(gtf)
output:
path 'STAR_index/' emit: index_dir
script:
"""
mkdir -p STAR_index
STAR --runThreadN $task.cpus \
--runMode genomeGenerate \
--genomeDir STAR_index \
--genomeFastaFiles $reference \
--sjdbGTFfile $gtf
"""
}
Workflow snippet
// Channel with sample reads
Channel.fromFilePairs(params.reads)
| set { align_ch }
// STAR index
star_index_result = STAR(star_index_ch)
// Flatten reads and prepare tuples
pre_aligned_input_ch = align_ch.map { sample, reads ->
tuple(sample, reads.toArray())
}
aligned_input = pre_aligned_input_ch.combine(star_index_result.index_dir) { sample_tuple, index_dir ->
def sample = sample_tuple[0]
def reads = sample_tuple[1]
tuple(index_dir, sample, *reads)
}
STAR_ALIGN(aligned_input)
align_ch returns tuples like [sample_name, [read1, read2]]
. I want aligned_input tuples to look like [index_dir, sample1, read1, read2]
.
Reads are structured like *_{R1,R2}.subset.fastq.gz
where the wildcard is the sample name.
STAR align process
process STAR_ALIGN {
publishDir params.outdir, mode: "copy"
label 'process_high'
input:
tuple path(index_dir), val(sample), path(reads)
output:
tuple val(sample), path("*.bam"), emit: bam
tuple val(sample), path("*.Log.final.out"), emit: log
script:
"""
STAR \
--runThreadN $task.cpus \
--genomeDir ${index_dir} \
--readFilesIn ${reads.join(' ')} \
--readFilesCommand zcat \
--outFileNamePrefix ${sample}_ \
--outSAMtype BAM SortedByCoordinate \
2> ${sample}.Log.final.out
"""
}
Problem The tuples I am feeding into the STAR align process are not valid, and I either get this error:
ERROR ~ No such variable: path -- Check script 'modules/star/main.nf' at line: 12 or see '.nextflow.log' file for more details
or a DataflowVariable error, or not a valid path.
My goal is to massage the input to STAR_ALIGN so that it receives a tuple like this:
[index_dir, sample_name, [read1, read2]]
Current attempts with .combine() and *reads either throw a DataflowVariable error or the path variable is missing.
Question How can I properly construct a Nextflow channel/tuple so that each sample is paired with the STAR index directory and the list of reads, in a format acceptable for the STAR_ALIGN process input:
tuple path(index_dir), val(sample), path(reads)