Nextflow: How to format input tuple for STAR_ALIGN process with STAR index directory
1
0
Entering edit mode
7 hours ago
DdogBoss ▴ 20

Hi all,

I am trying to pass the output of a STAR genome index process to a STAR alignment process in Nextflow, but I keep running into tuple/variable issues. Here’s a minimal reproducible example of my setup.

STAR index process

    process STAR {
    publishDir params.outdir, mode: "copy"
    label 'process_high'

    input:
        tuple path(reference), path(gtf)

    output:
         path 'STAR_index/' emit: index_dir

    script:
    """
    mkdir -p STAR_index
    STAR --runThreadN $task.cpus \
         --runMode genomeGenerate \
         --genomeDir STAR_index \
         --genomeFastaFiles $reference \
         --sjdbGTFfile $gtf
    """
    }

Workflow snippet

    // Channel with sample reads
    Channel.fromFilePairs(params.reads)
    | set { align_ch }
    // STAR index
    star_index_result = STAR(star_index_ch)
    // Flatten reads and prepare tuples
    pre_aligned_input_ch = align_ch.map { sample, reads ->
    tuple(sample, reads.toArray())   
    }
    aligned_input = pre_aligned_input_ch.combine(star_index_result.index_dir) { sample_tuple, index_dir ->
    def sample = sample_tuple[0]
    def reads  = sample_tuple[1]
    tuple(index_dir, sample, *reads)  
    }
    STAR_ALIGN(aligned_input)

align_ch returns tuples like [sample_name, [read1, read2]]. I want aligned_input tuples to look like [index_dir, sample1, read1, read2].

Reads are structured like *_{R1,R2}.subset.fastq.gz where the wildcard is the sample name.

STAR align process

    process STAR_ALIGN {
    publishDir params.outdir, mode: "copy"
    label 'process_high'

    input:
    tuple path(index_dir), val(sample), path(reads)

    output:
    tuple val(sample), path("*.bam"), emit: bam
    tuple val(sample), path("*.Log.final.out"), emit: log

    script:
    """
    STAR \
        --runThreadN $task.cpus \
        --genomeDir ${index_dir} \
        --readFilesIn ${reads.join(' ')} \
        --readFilesCommand zcat \
        --outFileNamePrefix ${sample}_ \
        --outSAMtype BAM SortedByCoordinate \
        2> ${sample}.Log.final.out
    """
    }

Problem The tuples I am feeding into the STAR align process are not valid, and I either get this error:

    ERROR ~ No such variable: path -- Check script 'modules/star/main.nf' at line: 12 or see '.nextflow.log' file for more details

or a DataflowVariable error, or not a valid path.

My goal is to massage the input to STAR_ALIGN so that it receives a tuple like this:

    [index_dir, sample_name, [read1, read2]]

Current attempts with .combine() and *reads either throw a DataflowVariable error or the path variable is missing.

Question How can I properly construct a Nextflow channel/tuple so that each sample is paired with the STAR index directory and the list of reads, in a format acceptable for the STAR_ALIGN process input:

tuple path(index_dir), val(sample), path(reads)
nextflow • 76 views
ADD COMMENT
0
Entering edit mode
2 hours ago

align_ch returns tuples like [sample_name, [read1, read2]]. I want aligned_input tuples to look like [index_dir, sample1, read1, read2].

STAR_ALIGN(
   star_index_result.combine(
     align_ch.map{sn,reads->[sn,reads[0],reads[1]]}
        ) )

and STAR_ALIGN :

input:
   tuple path(index_dir), val(sample), path(R1),path(R2)
ADD COMMENT

Login before adding your answer.

Traffic: 3647 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6