Question: Final process in Nextflow not using all files in channel
1
gravatar for Barry Digby
4 days ago by
Barry Digby240
National University of Ireland, Galway
Barry Digby240 wrote:

Hi,

I was hoping someone with nextflow experience could help me with this issue.

My script is a run of the mill Hisat2/Stringtie nextflow script. However my issue is that the final process of the script takes one file at random from the previous channel, and finishes the script without any errors. Here is a portion of the code showing where the inputs to the final process originate from:

#!/usr/bin/env nextflow

params.genome = "Reference/chr22.fa"
genome_fasta = files( params.genome )

params.annot = "Annotation/chr22.gtf"
Channel
        .fromPath( params.annot )
        .into { gtf1; gtf2; gtf3 }

params.reads = "trimmed_reads/*_r{1,2}.trimmed.fastq.gz"
Channel
        .fromFilePairs( params.reads )
        .set { read_ch }


......
extract exons, splice sites, align with hisat2, pipe to bam
......

process Sort_Index_Bams {
    publishDir "BAMS/", mode:'copy'

    input:
    set val(key), file(bam) from hisat_bams

    output:
    set val(key), file("${key}.bam") into hisat_bams1
    file "${key}.bam.bai" into indexed

    script:
    def avail_mem = task.memory == null ? '' : "-m ${task.memory.toBytes() / task.cpus}"
    """
    samtools sort \\
    $bam \\
    -@ ${task.cpus} $avail_mem \\
    -o ${key}.bam
    samtools index ${key}.bam
    """
}

hisat_bams1.into { hisat_bams2; hisat_bams3 }

process Assemble_Transcripts{
    publishDir "Assembly/", mode:'copy'

    input:
    set val(key), file(bam) from hisat_bams2
    file(gtf) from gtf2

    output:
    file("${key}.gtf") into hisat_transcripts

    script:
    """
    stringtie \
    ${bam} \
    -G ${gtf} \
    -l ${key} \
    -o ${key}.gtf \
    -p ${task.cpus} \
    """
}

I have tried to alter the final process as such:

  • set val(key), file (bam) from hisat_bams2.collect(). This returned an error of "Input tuple does not match input set cardinality declared by process Assemble_Transcripts".
  • Set the input GTF file to a value channel as described on this stackoverflow post here. The user reported same issue, however it did not solve my problem.

Any suggestions would be greatly, greatly appreciated.

Regards,

Barry

nextflow • 88 views
ADD COMMENTlink modified 4 days ago by Pierre Lindenbaum125k • written 4 days ago by Barry Digby240

It often helps when you add a tag:

process Sort_Index_Bams {
    tag "${key} ${bam}"

I usually just write:

   set key,bam from hisat_bams2

when I need things like the bam index:

   ${bam.toRealPath()}
ADD REPLYlink modified 4 days ago • written 4 days ago by Pierre Lindenbaum125k
2
gravatar for Pierre Lindenbaum
4 days ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum125k wrote:

instead of

   set val(key), file(bam) from hisat_bams2
   file(gtf) from gtf2

I would go for

 set key,bam,gtf from hisat_bams2.combine(gtf2)

because here, gtf2 is a Channel, not a file.

ADD COMMENTlink modified 4 days ago • written 4 days ago by Pierre Lindenbaum125k

Thanks Pierre, just the guy I was hoping would see this question. Your suggestion worked.

Was my code using the GTF file once in the final process, and then ending? This seems to be the behaviour if i set the input GTF as files(params.annotation).

ADD REPLYlink written 4 days ago by Barry Digby240

Was my code using the GTF file once in the final process, and then ending?

yes !

ADD REPLYlink written 4 days ago by Pierre Lindenbaum125k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1213 users visited in the last hour