Question

BWA alignment failing after adding nextflow.config file?

2

Entering edit mode

3.0 years ago

Eliveri ▴ 350

I am writing a nextflow script to be run on a HPC cluster in an activated conda environment. However once I added a nextflow.config file, the trimmomatic steps seems to run fine however the ALIGN step bwa mem/samtools sort returns an empty alignment -- thus subsequent steps results in empty reads. Before I added a nextflow.config file, everything ran fine.

Is there something wrong with my config file? The TRIM step output is expected however the ALIGN step returns an empty .bam. There is no warning or anything until a later step in the workflow which uses the subsequent outputs.

I added a env.yml file:

name: conda environment
channels: 
  - defaults
  - bioconda
  - conda-forge
dependencies: 
  - trimmomatic=0.36
  - bwa=0.7.17
  - samtools=1.6

My nextflow.config file:

params {
    outdir = "./results"
}

singularity {
    enabled = true
    autoMounts = true
}

conda {
    conda = './env.yml'
    enabled = true
}

process {
    executor = "sge"
    scratch = true
    stageInMode = "copy"
    stageOUtMode="move"
    errorStrategy = "retry"
    clusterOptions = "-S /bin/bash -o job.log -e job.err"
    conda = './env.yml'

    withName: TRIM {
        conda = 'bioconda::trimmomatic=0.36'
    }
    withName: BWA_PF {
        conda = 'bioconda::bwa=0.7.17'
        conda = 'bioconda::samtools=1.6'
    }
}

My main.nf file:

#!/usr/bin/env nextflow
 nextflow.enable.dsl=2

/*
 * pipeline input parameters
 */
params.reads = "$projectDir/data/*_R{1,2}.fastq.gz"
params.mindex = "$projectDir/ref/myref"
params.outdir = "$projectDir/results"

log.info """\
    R N A S E Q - N F   P I P E L I N E
    ===================================
    reads           : ${params.reads}
    myindex        : ${params.myindex}
    outdir          : ${params.outdir}
    """
    .stripIndent()

//trimmomatic read trimming
process TRIM {

    tag "trim ${pair_id}"   

    publishDir "${params.outdir}/$pair_id"

    input:
    tuple val(pair_id), path(reads) 

    output:
    tuple val(pair_id), path("trimmed_${pair_id}_R{1,2}_{paired,unpaired}.fastq.gz")

    conda 'bioconda::trimmomatic'

    script:
    """
    #!/usr/bin/env bash
    trimmomatic \
        PE ${reads[0]} ${reads[1]} \
        "trimmed_${pair_id}_R1_paired.fastq.gz" "trimmed_${pair_id}_R1_unpaired.fastq.gz" \
        "trimmed_${pair_id}_R2_paired.fastq.gz" "trimmed_${pair_id}_R2_unpaired.fastq.gz" \
        ILLUMINACLIP:TruSeq3-PE.fa:2:30:10:2:True \
        LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15
    """


}


//bwa alignment
process ALIGN {

    tag "align ${pair_id}"

    publishDir "${params.outdir}/$pair_id"

    input:
    tuple val(pair_id), path(reads)
    path index

    output:
    tuple val(pair_id), path("${pair_id}_mapped.{bam,bam.bai}")

    conda 'bioconda::bwa'
    conda 'bioconda::samtools'

    time '2h'
    cpus 8
    penv 'smp' 
    memory '30 GB'

    script:
    """
    #!/usr/bin/env bash
    bwa mem -t 10 -M -k 25 $index/allPf.fasta \
    ${reads[0]} ${reads[2]} | samtools sort -@ 10 -o "${pair_id}_mapped.bam" \
    && samtools index "${pair_id}_mapped.bam"
    """
}

cluster nextflow hpc meme bwa • 2.2k views

ADD COMMENT • link updated 3.0 years ago by ATpoint 90k • written 3.0 years ago by Eliveri ▴ 350

1

Entering edit mode

Please share the .nextflow.log and .nextflow.sh files of the bwa mem work directory.

${reads[0]} ${reads[2]} => shouldn't this be [1]?

ADD REPLY • link 3.0 years ago by ATpoint 90k

0

Entering edit mode

It is reads[0] and reads[2] due to the naming of the files. This should be correct as it ran fine before I added a nextflow.config file.

Is this the log file? Sorry I am rather new to nextflow so I am not sure where the the .nextflow.log and .nextflow.sh files are.

2022-11-11 12:24:59 20m 42s     peaceful_stallman       ERR     4375c6d175  033f9cbb-c8ec-4d71-970d-ff15e2933839    nextflow run script_workflow_elucidator.nf                              
2022-11-11 15:46:03 21m 7s      evil_meucci             ERR     5ac9fa1c78  b5aa8776-9f67-428e-9a24-15ec85ff9493    nextflow run script_workflow_elucidator.nf                              
2022-11-11 16:09:11 6m 18s      big_poisson             ERR     5ac9fa1c78  82819f60-8faf-4973-b344-ea30d6a4a1cf    nextflow run script_workflow_elucidator.nf

I found job.log

mytest/work/48/dd93a1b2345cb7516f45c680dc7e3d/.command.sh: line 3: bwa: command not found

It seems the bwa command was indeed not found? But there was no errors thrown during the run, it just continued to the next step.

ADD REPLY • link 3.0 years ago by Eliveri ▴ 350

score 2 · Answer 1 · 2022-11-11

2

Entering edit mode

3.0 years ago

Eliveri ▴ 350

I think I may have found the error! Thanks to @ATpoint I learned to look at the log files and saw that indeed the command was not found. The mistake is in my config file where I had two lines of "conda = " this does not work as the second line will override the first. It is better to replace it with conda = "./path_of_env.yml" instead

process {
    ...
    withName: TRIM {
        conda = 'bioconda::trimmomatic=0.36'
    }
    withName: BWA_PF {
        conda = 'bioconda::bwa=0.7.17'
        conda = 'bioconda::samtools=1.6' //ERROR HERE, this overides the previous line so bwa is not found
    }
}

ADD COMMENT • link 3.0 years ago by Eliveri ▴ 350

0

Entering edit mode

Glad you solved it. Yes, it indeed is probably preferrable to use dedicated yml files for this as in https://www.nextflow.io/docs/latest/conda.html#use-conda-environment-files

ADD REPLY • link 3.0 years ago by ATpoint 90k