I am writing a nextflow script to be run on a HPC cluster in an activated conda environment. However once I added a nextflow.config
file, the trimmomatic
steps seems to run fine however the ALIGN step bwa mem/samtools sort
returns an empty alignment -- thus subsequent steps results in empty reads. Before I added a nextflow.config
file, everything ran fine.
Is there something wrong with my config file? The TRIM step output is expected however the ALIGN step returns an empty .bam. There is no warning or anything until a later step in the workflow which uses the subsequent outputs.
I added a env.yml file:
name: conda environment
channels:
- defaults
- bioconda
- conda-forge
dependencies:
- trimmomatic=0.36
- bwa=0.7.17
- samtools=1.6
My nextflow.config
file:
params {
outdir = "./results"
}
singularity {
enabled = true
autoMounts = true
}
conda {
conda = './env.yml'
enabled = true
}
process {
executor = "sge"
scratch = true
stageInMode = "copy"
stageOUtMode="move"
errorStrategy = "retry"
clusterOptions = "-S /bin/bash -o job.log -e job.err"
conda = './env.yml'
withName: TRIM {
conda = 'bioconda::trimmomatic=0.36'
}
withName: BWA_PF {
conda = 'bioconda::bwa=0.7.17'
conda = 'bioconda::samtools=1.6'
}
}
My main.nf
file:
#!/usr/bin/env nextflow
nextflow.enable.dsl=2
/*
* pipeline input parameters
*/
params.reads = "$projectDir/data/*_R{1,2}.fastq.gz"
params.mindex = "$projectDir/ref/myref"
params.outdir = "$projectDir/results"
log.info """\
R N A S E Q - N F P I P E L I N E
===================================
reads : ${params.reads}
myindex : ${params.myindex}
outdir : ${params.outdir}
"""
.stripIndent()
//trimmomatic read trimming
process TRIM {
tag "trim ${pair_id}"
publishDir "${params.outdir}/$pair_id"
input:
tuple val(pair_id), path(reads)
output:
tuple val(pair_id), path("trimmed_${pair_id}_R{1,2}_{paired,unpaired}.fastq.gz")
conda 'bioconda::trimmomatic'
script:
"""
#!/usr/bin/env bash
trimmomatic \
PE ${reads[0]} ${reads[1]} \
"trimmed_${pair_id}_R1_paired.fastq.gz" "trimmed_${pair_id}_R1_unpaired.fastq.gz" \
"trimmed_${pair_id}_R2_paired.fastq.gz" "trimmed_${pair_id}_R2_unpaired.fastq.gz" \
ILLUMINACLIP:TruSeq3-PE.fa:2:30:10:2:True \
LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15
"""
}
//bwa alignment
process ALIGN {
tag "align ${pair_id}"
publishDir "${params.outdir}/$pair_id"
input:
tuple val(pair_id), path(reads)
path index
output:
tuple val(pair_id), path("${pair_id}_mapped.{bam,bam.bai}")
conda 'bioconda::bwa'
conda 'bioconda::samtools'
time '2h'
cpus 8
penv 'smp'
memory '30 GB'
script:
"""
#!/usr/bin/env bash
bwa mem -t 10 -M -k 25 $index/allPf.fasta \
${reads[0]} ${reads[2]} | samtools sort -@ 10 -o "${pair_id}_mapped.bam" \
&& samtools index "${pair_id}_mapped.bam"
"""
}
Please share the
.nextflow.log
and.nextflow.sh
files of the bwa memwork
directory.${reads[0]} ${reads[2]}
=> shouldn't this be [1]?It is reads[0] and reads[2] due to the naming of the files. This should be correct as it ran fine before I added a nextflow.config file.
Is this the log file? Sorry I am rather new to nextflow so I am not sure where the the .nextflow.log and .nextflow.sh files are.
I found
job.log
It seems the bwa command was indeed not found? But there was no errors thrown during the run, it just continued to the next step.