Question

how to pass trimmed fastP output to bwa in Nextflow

0

Entering edit mode

18 months ago

rohitsatyam102 ▴ 840

Hi Everyone!!

I am trying to pass the FASTP trimmed files to BWAMEM process in nextflow and for some reason after running the code mentioned below (dsl2), I only get one bam file. The FASTP runs on all 24 files but BWAMEM process runs only on single file. I don't understand what's wrong. I am trying to imitate this pipeline


params.memory = "3g"
params.cpus = 1
params.output = "."

process FASTP{
    cpus params.cpus
    memory params.memory
    publishDir "${params.output}/02_adapterTrimming", mode: 'copy'

    input:
        tuple val(sid), path(reads)

    output:
        tuple val(sid), file(fq_1_paired), file(fq_2_paired), emit: trimmed_reads
                file("${sid}.fastp_stats.json")
                file("${sid}.fastp_stats.html")

        script:
    fq_1_paired = sid + '_R1_P.fastq.gz'
    fq_2_paired = sid + '_R2_P.fastq.gz'
    """
    fastp \
    --in1 ${reads[0]} \
    --in2 ${reads[1]}\
    --out1 $fq_1_paired \
    --out2 $fq_2_paired \
    --json ${sid}.fastp_stats.json \
    --html ${sid}.fastp_stats.html
    """
}

process BWAMEM{
        publishDir "${params.output}/03_alignment", mode: 'copy'
        memory params.memory
        cpus params.cpus

        input:
        tuple val(sid), file(reads1), file(reads2)
        val(reference)

        output:
        path "*.sorted.bam", emit: alignments
        path "*.bai"

        shell:
        '''
        ref=$(echo !{reference} | sed -e 's/\\.[^.]*$//')
        id=$(zcat !{reads1} | head -n 1 | cut -f 3-4 -d":" | sed 's/@//')
        bwa mem -M -R "$(echo "@RG\\tID:${id}\\tSM:!{sid}\\tPL:ILLUMINA")" -t !{task.cpus} ${ref} !{reads1} !{reads2} | samtools sort -@ !{task.cpus} -o !{sid}.sorted.bam -
        samtools index -@ !{task.cpus} !{sid}.sorted.bam
        '''
}

and the workflow section is

if (params.input != false) {
            Channel.fromFilePairs(params.input, checkIfExists: true )
                .set { input_fastqs }
        }
workflow{
    reference_ch=BWAINDEX.out.bwa_idx.flatten().filter(~/.*fai/)
    FASTP(input_fastqs)
    BWAMEM(FASTP.out[0], reference_ch)
}

I tried BWAMEM(FASTP.out.trimmed_reads.groupTuple(), reference_ch) as well but it's aligning a single sample

N E X T F L O W  ~  version 21.10.6
Launching `main.nf` [pedantic_montalcini] - revision: aeba1fa55a
executor >  local (26)
[11/767e8f] process > BWAINDEX   [100%] 1 of 1 ✔
[96/644b6d] process > FASTP (24) [100%] 24 of 24 ✔
[97/2a2e3d] process > BWAMEM (1) [100%] 1 of 1 ✔

fastp bwa nextflow • 1.6k views

ADD COMMENT • link 18 months ago by rohitsatyam102 ▴ 840

score 3 · Accepted Answer · 2022-10-24

3

Entering edit mode

18 months ago

Maxime Garcia ▴ 340

Hi,

Your issue is actually not the trim reads, but the bwa index. Can you try with adding .collect() like this:

reference_ch=BWAINDEX.out.bwa_idx.flatten().filter(~/.*fai/).collect()

Basically the issue was due to the fact that the channel was a queue channel and was getting consumed, with only 1 item inside, you only get one run. The .collect() operator should help it make it out as a value channel. cf https://www.nextflow.io/docs/latest/operator.html#collect

ADD COMMENT • link 18 months ago by Maxime Garcia ▴ 340

0

Entering edit mode

It works. However, now it gives me the following error. Is there a way to apply toRealPath() with basename?

N E X T F L O W  ~  version 21.10.6
Launching `main.nf` [high_ardinghelli] - revision: 86556531ff
executor >  local (49)
[c3/7283af] process > BWAINDEX   [100%] 1 of 1 ✔
[b8/06c905] process > FASTP (23) [100%] 24 of 24 ✔
[aa/f1fc0f] process > BWAMEM (6) [  0%] 0 of 24
[/home/subudhak/Documents/COVID_Project/nextCov/work/c3/7283afa8ec4bf917d966c37ec390f9/Sars_cov_2.ASM985889v3.dna.toplevel.fa.fai]
Error executing process > 'BWAMEM (5)'

Caused by:
  Process `BWAMEM (5)` terminated with an error exit status (1)

Command executed:


  id=$(zcat 21_S21_L001_R1_P.fastq.gz | head -n 1 | cut -f 3-4 -d":" | sed 's/@//')
  bwa mem -M -R "$(echo "@RG\tID:${id}\tSM:21_S21_L001\tPL:ILLUMINA")" -t 1 Sars_cov_2.ASM985889v3.dna.toplevel.fa 21_S21_L001_R1_P.fastq.gz 21_S21_L001_R2_P.fastq.gz | s
amtools sort -@ 1 -o 21_S21_L001.sorted.bam -
  samtools index -@ 1 21_S21_L001.sorted.bam

Command exit status:
  1

Command output:
  (empty)
executor >  local (49)
[c3/7283af] process > BWAINDEX    [100%] 1 of 1 ✔
[b8/06c905] process > FASTP (23)  [100%] 24 of 24 ✔
[ba/be5a18] process > BWAMEM (15) [100%] 23 of 23, failed: 23
[/home/subudhak/Documents/COVID_Project/nextCov/work/c3/7283afa8ec4bf917d966c37ec390f9/Sars_cov_2.ASM985889v3.dna.toplevel.fa.fai]
Error executing process > 'BWAMEM (5)'

Command error:
  [E::bwa_idx_load_from_disk] fail to locate the index files
  samtools sort: failed to read header from "-"

Work dir:
  /home/subudhak/Documents/COVID_Project/nextCov/work/37/36a0ef6525b44c33e678e87e8cc517

Tip: when you have fixed the problem you can continue the execution adding the option `-resume` to the run command line

ADD REPLY • link 18 months ago by rohitsatyam102 ▴ 840

0

Entering edit mode

Thanks I resolved this issue with emitting .fa file directly and using toRealPath

ADD REPLY • link 18 months ago by rohitsatyam102 ▴ 840

score 2 · Accepted Answer · 2022-10-24

2

Entering edit mode

18 months ago

Maxime Garcia ▴ 340

Any more information about what actually went wrong in the .command.log or .command.err in the work folder related to that particular task?

ADD COMMENT • link 18 months ago by Maxime Garcia ▴ 340

1

Entering edit mode

The final code used was

params.memory = "3g"
params.cpus = 1
params.output = "."


process BWAINDEX{
        publishDir "${params.output}/00_index", mode: 'copy'
        memory params.memory
        cpus params.cpus

        input:
        path (fasta)

        output:
        path "$fasta", emit: bwa_idx        <--------------------Made changes here ######################
        path (fasta)

        script:
        """
        bwa index $fasta
        gatk CreateSequenceDictionary -R $fasta
        samtools faidx $fasta
        """
        }
process FASTP{
    cpus params.cpus
    memory params.memory
    publishDir "${params.output}/02_adapterTrimming", mode: 'copy'

    input:
        tuple val(sid), path(reads)

    output:
        tuple val(sid), file(fq_1_paired), file(fq_2_paired), emit: trimmed_reads
                file("${sid}.fastp_stats.json")
                file("${sid}.fastp_stats.html")

        script:
    fq_1_paired = sid + '_R1_P.fastq.gz'
    fq_2_paired = sid + '_R2_P.fastq.gz'
    """
    fastp \
    --in1 ${reads[0]} \
    --in2 ${reads[1]}\
    --out1 $fq_1_paired \
    --out2 $fq_2_paired \
    --json ${sid}.fastp_stats.json \
    --html ${sid}.fastp_stats.html
    """
}

process BWAMEM{
        publishDir "${params.output}/03_alignment", mode: 'copy'
        memory params.memory
        cpus params.cpus

        input:
        tuple val(sid), file(reads1), file(reads2)
        path(reference)                 <--------------------Made changes here ######################

        output:
        path "*.sorted.bam", emit: alignments
        path "*.bai"

        shell:
        '''
                                                                    <--------------------Made changes here (deleted this line) ######################
        id=$(zcat !{reads1} | head -n 1 | cut -f 3-4 -d":" | sed 's/@//')
        bwa mem -M -R "$(echo "@RG\\tID:${id}\\tSM:!{sid}\\tPL:ILLUMINA")" -t !{task.cpus} !{reference.toRealPath()} !{reads1} !{reads2} | samtools sort -@ !{task.cpus} -o !{sid}.sorted.bam -     <--------------------Made changes here ######################
        samtools index -@ !{task.cpus} !{sid}.sorted.bam
        '''
}

if (params.input != false) {
            Channel.fromFilePairs(params.input, checkIfExists: true )
                .set { input_fastqs }
        }
workflow{
    BWAINDEX(params.ref)
    FASTP(input_fastqs)
   BWAMEM(FASTP.out[0], BWAINDEX.out[0])        <--------------------Made changes here ######################
}

ADD REPLY • link 18 months ago by rohitsatyam102 ▴ 840