Final process in Nextflow not using all files in channel
1
1
Entering edit mode
4.3 years ago
Barry Digby ★ 1.3k

Hi,

I was hoping someone with nextflow experience could help me with this issue.

My script is a run of the mill Hisat2/Stringtie nextflow script. However my issue is that the final process of the script takes one file at random from the previous channel, and finishes the script without any errors. Here is a portion of the code showing where the inputs to the final process originate from:

#!/usr/bin/env nextflow

params.genome = "Reference/chr22.fa"
genome_fasta = files( params.genome )

params.annot = "Annotation/chr22.gtf"
Channel
        .fromPath( params.annot )
        .into { gtf1; gtf2; gtf3 }

params.reads = "trimmed_reads/*_r{1,2}.trimmed.fastq.gz"
Channel
        .fromFilePairs( params.reads )
        .set { read_ch }


......
extract exons, splice sites, align with hisat2, pipe to bam
......

process Sort_Index_Bams {
    publishDir "BAMS/", mode:'copy'

    input:
    set val(key), file(bam) from hisat_bams

    output:
    set val(key), file("${key}.bam") into hisat_bams1
    file "${key}.bam.bai" into indexed

    script:
    def avail_mem = task.memory == null ? '' : "-m ${task.memory.toBytes() / task.cpus}"
    """
    samtools sort \\
    $bam \\
    -@ ${task.cpus} $avail_mem \\
    -o ${key}.bam
    samtools index ${key}.bam
    """
}

hisat_bams1.into { hisat_bams2; hisat_bams3 }

process Assemble_Transcripts{
    publishDir "Assembly/", mode:'copy'

    input:
    set val(key), file(bam) from hisat_bams2
    file(gtf) from gtf2

    output:
    file("${key}.gtf") into hisat_transcripts

    script:
    """
    stringtie \
    ${bam} \
    -G ${gtf} \
    -l ${key} \
    -o ${key}.gtf \
    -p ${task.cpus} \
    """
}

I have tried to alter the final process as such:

  • set val(key), file (bam) from hisat_bams2.collect(). This returned an error of "Input tuple does not match input set cardinality declared by process Assemble_Transcripts".
  • Set the input GTF file to a value channel as described on this stackoverflow post here. The user reported same issue, however it did not solve my problem.

Any suggestions would be greatly, greatly appreciated.

Regards,

Barry

nextflow • 2.9k views
ADD COMMENT
0
Entering edit mode

It often helps when you add a tag:

process Sort_Index_Bams {
    tag "${key} ${bam}"

I usually just write:

   set key,bam from hisat_bams2

when I need things like the bam index:

   ${bam.toRealPath()}
ADD REPLY
4
Entering edit mode
4.3 years ago

instead of

   set val(key), file(bam) from hisat_bams2
   file(gtf) from gtf2

I would go for

 set key,bam,gtf from hisat_bams2.combine(gtf2)

because here, gtf2 is a Channel, not a file.

ADD COMMENT
0
Entering edit mode

Thanks Pierre, just the guy I was hoping would see this question. Your suggestion worked.

Was my code using the GTF file once in the final process, and then ending? This seems to be the behaviour if i set the input GTF as files(params.annotation).

ADD REPLY
0
Entering edit mode

Was my code using the GTF file once in the final process, and then ending?

yes !

ADD REPLY

Login before adding your answer.

Traffic: 1785 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6