I am writing a nextflow script to be run on a HPC cluster in an activated conda environment. However once I added a nextflow.config file, the trimmomatic steps seems to run fine however the ALIGN step bwa mem/samtools sort returns an empty alignment -- thus subsequent steps results in empty reads. Before I added a nextflow.config file, everything ran fine.
Is there something wrong with my config file? The TRIM step output is expected however the ALIGN step returns an empty .bam. There is no warning or anything until a later step in the workflow which uses the subsequent outputs.
I added a env.yml file:
name: conda environment
channels:
- defaults
- bioconda
- conda-forge
dependencies:
- trimmomatic=0.36
- bwa=0.7.17
- samtools=1.6
My nextflow.config file:
params {
outdir = "./results"
}
singularity {
enabled = true
autoMounts = true
}
conda {
conda = './env.yml'
enabled = true
}
process {
executor = "sge"
scratch = true
stageInMode = "copy"
stageOUtMode="move"
errorStrategy = "retry"
clusterOptions = "-S /bin/bash -o job.log -e job.err"
conda = './env.yml'
withName: TRIM {
conda = 'bioconda::trimmomatic=0.36'
}
withName: BWA_PF {
conda = 'bioconda::bwa=0.7.17'
conda = 'bioconda::samtools=1.6'
}
}
My main.nf file:
#!/usr/bin/env nextflow
nextflow.enable.dsl=2
/*
* pipeline input parameters
*/
params.reads = "$projectDir/data/*_R{1,2}.fastq.gz"
params.mindex = "$projectDir/ref/myref"
params.outdir = "$projectDir/results"
log.info """\
R N A S E Q - N F P I P E L I N E
===================================
reads : ${params.reads}
myindex : ${params.myindex}
outdir : ${params.outdir}
"""
.stripIndent()
//trimmomatic read trimming
process TRIM {
tag "trim ${pair_id}"
publishDir "${params.outdir}/$pair_id"
input:
tuple val(pair_id), path(reads)
output:
tuple val(pair_id), path("trimmed_${pair_id}_R{1,2}_{paired,unpaired}.fastq.gz")
conda 'bioconda::trimmomatic'
script:
"""
#!/usr/bin/env bash
trimmomatic \
PE ${reads[0]} ${reads[1]} \
"trimmed_${pair_id}_R1_paired.fastq.gz" "trimmed_${pair_id}_R1_unpaired.fastq.gz" \
"trimmed_${pair_id}_R2_paired.fastq.gz" "trimmed_${pair_id}_R2_unpaired.fastq.gz" \
ILLUMINACLIP:TruSeq3-PE.fa:2:30:10:2:True \
LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15
"""
}
//bwa alignment
process ALIGN {
tag "align ${pair_id}"
publishDir "${params.outdir}/$pair_id"
input:
tuple val(pair_id), path(reads)
path index
output:
tuple val(pair_id), path("${pair_id}_mapped.{bam,bam.bai}")
conda 'bioconda::bwa'
conda 'bioconda::samtools'
time '2h'
cpus 8
penv 'smp'
memory '30 GB'
script:
"""
#!/usr/bin/env bash
bwa mem -t 10 -M -k 25 $index/allPf.fasta \
${reads[0]} ${reads[2]} | samtools sort -@ 10 -o "${pair_id}_mapped.bam" \
&& samtools index "${pair_id}_mapped.bam"
"""
}
Please share the
.nextflow.logand.nextflow.shfiles of the bwa memworkdirectory.${reads[0]} ${reads[2]}=> shouldn't this be [1]?It is reads[0] and reads[2] due to the naming of the files. This should be correct as it ran fine before I added a nextflow.config file.
Is this the log file? Sorry I am rather new to nextflow so I am not sure where the the .nextflow.log and .nextflow.sh files are.
I found
job.logIt seems the bwa command was indeed not found? But there was no errors thrown during the run, it just continued to the next step.