Nextflow, transform multiple outputs of one process to paired outputs and use them as input to the next proccess.
1
0
Entering edit mode
10 months ago
MolGeek ▴ 50

Hi everyone,

I am trying to learn making workflows using Nextflow. I want to make an ATAC Seq workflow. I have 2 set of paired end ATAC Seq data.

First i perform adapter trimming using trimgalore. The ouputs of trimGalore consist of the two trimmed fqs for each set.

Then i want those trimmed fqs to be used as input in bowtie2, but i havent found a way of transforming pairedTrimmedCh to paired file channel in order to run the alignment proccess.

Any help?

Thanks in advance!

#!/usr/bin/env nextflow

params.reads = "./*{1,2}.fastq"
params.outdir = "bams"
params.INDEX = "path_to_index"
params.cpus = 10

log.info """\

    A T A C S E Q - N F   P I P E L I N E
    ===================================
    Genome: ${params.INDEX}
    reads        : ${params.reads}
    outdir       : ${params.outdir}
    """
    .stripIndent(true)



process trimReads {

    publishDir "$params.outdir/", mode: 'copy'
    input:
    tuple val(sampleid), path(reads)

    output:
    path "./trimmed/" 

    script:
    """
TrimGalore-0.6.7/trim_galore --cores 7 --paired --no_report_file ${reads[0]} ${reads[1]}  -o ./trimmed/

    """
}


process alignment {
    publishDir "$params.outdir/", mode: 'copy'

    input:
    tuple val(sampleid), path(reads)

    output:
    path "${sampleid}.mm10.sorted.bam", emit: bams
    path "${sampleid}.mm10.sorted.bam.bai"

    script:
    """
    bowtie2 --local -X 2000 -p ${params.cpus} -x ${params.INDEX} -1 ${trimmed_reads[0]} -2 ${trimmed_reads[1]} | samtools view -b -h -S -q 10 -f 0x2 | samtools sort -@ ${params.cpus}  > ${sampleid}.mm10.sorted.bam
    samtools index ${sampleid}.mm10.sorted.bam
    """
}

workflow {
    // Create a channel with fastqs. If paired-end, use .fromFilePairs
    Channel
        .fromFilePairs(params.reads, checkIfExists: true)
        .set { read_ch }

    pairedTrimmedCh = trimReads(read_ch).groupTuple()

    align_ch = alignment(pairedTrimmedCh)
}
Nextflow • 1.1k views
ADD COMMENT
0
Entering edit mode

The way you provide the index will cause a problem, see for a solution bowtie2 Mapping Using Pre-built Index in Nextflow

ADD REPLY
0
Entering edit mode

I am aware of it. I provide the index as a variable such as

params.dir = path_to_Bowtie2Index/genome

and then

 ${params.INDEX}
ADD REPLY
0
Entering edit mode

That is not going to work I assume. You cannot stage basenames. You can stage the folder with the index files, but then need some trick to find the files in it, as I described in that other thread.

ADD REPLY
0
Entering edit mode

It works! :)

ADD REPLY
1
Entering edit mode
10 months ago

not tested, in trimReads I usually do something like (I'm not sure about the files generated by trim_galore, check this please):

(...)
output:
   path("fastqs.tsv"),emit:output
script:
"""
(...)
find  \${PWD}/trimmed  -type f -name "*trimmed.fq.gz" | sort | paste - - | awk 'BEGIN {printf("sample\tR1\tR2\\n");} {printf("{sampleid}\t%s\\n",\$0;}' > fastqs.tsv
"""

then read the output and use it in alignment

alignment( pairedTrimmedCh.output.splitCsv(sep:'\t',header:true) )

in alignment:

 input:
     val(row)

    output:
    path "${row.sample}.mm10.sorted.bam", emit: bams
    path "${row.sample}.mm10.sorted.bam.bai"

    script:
    """
    bowtie2 --local -X 2000 -p ${params.cpus} -x ${params.INDEX} -1 ${row.R1} -2 ${row.R2} | samtools view -b -h -S -q 10 -f 0x2 | samtools sort -@ ${params.cpus}  > ${row.sample}.mm10.sorted.bam
    samtools index ${row.sample}.mm10.sorted.bam
    """
ADD COMMENT
0
Entering edit mode

After a few adjustments in find it worked! Thank you.

For anyone dealing with the same problem using trimGalore, the adjustments that i did and worked are :

from:

find -type f -name \${PWD}/trimmed  -type f -name "*trimmed.fq.gz" | sort | paste - - | awk 'BEGIN {printf("sample\tR1\tR2\\n");} {printf("{sampleid}\t%s\\n",\$0;}' > fastqs.tsv

to:

 find \${PWD}/trimmed -type f -name "*.fq" | sort | paste - - | awk 'BEGIN {printf("sample\\tR1\\tR2\\n")} {printf("${sampleid}\\t%s\\n", \$0)}' > fastqs.tsv
ADD REPLY

Login before adding your answer.

Traffic: 1016 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6