Using shell command in Nextflow DSL=2 for BWA mem aligner
2
0
Entering edit mode
21 months ago

Hi Everyone!!

I am trying to write a nextflow script for bwa alignment, however, I am facing problem in adding read group in bwa mem command. The sample code and error is given below:

  1. Using shell block

nextflow.enable.dsl=2
params.raw = "data/*{1,2}.fastq.gz"
params.ref = "results/00_indexes/bwaidx"
params.outdir="results/04_alignments"

process BWAMEM{
    publishDir "$params.outdir/bwa.alignments"
    input:
    tuple val(sample_id), path(reads)
    output:
    path '${sample_id}.sorted.bam'
    shell:
    '''
    id=$(zcat !{reads[0]} | head -n 1 | cut -f 3-4 -d":" | sed 's/@//')
    echo "$id"
    bwa mem -M -R "$(echo "@RG\\tID:${id}\\tSM:!{sample_id}\\tPL:ILLUMINA")" -t 8 !{params.ref} !{reads[0]} !{reads[1]} | samtools sort -@8 -o !{sample_id}.sorted.bam -
    '''
}

reads_ch = Channel.fromFilePairs(params.raw, checkIfExists: true )

workflow {
  BWAMEM(reads_ch)
}
################# Error ###########################
Error executing process > 'BWAMEM (1)'

Caused by:
  Process `BWAMEM (1)` terminated with an error exit status (1)

Command executed:

  id=$(zcat S9_1.fastq.gz | head -n 1 | cut -f 3-4 -d":" | sed 's/@//')
  echo "$id"
  bwa mem -M -R "$(echo "@RG\tID:${id}\tSM:S9\tPL:ILLUMINA")" -t 8 results/00_indexes/bwaidx S9_1.fastq.gz S9_2.fastq.gz | samtools sort -@8 -o S9.sorted.bam -

Command exit status:
  1

Command output:
  000000000-JLT9H:1

Command error:
  [E::bwa_idx_load_from_disk] fail to locate the index files
  samtools sort: failed to read header from "-"

Work dir:
  /home/subudhak/Documents/nxtflow/work/73/7f89163a91a197851e8b84284c192b

Tip: you can replicate the issue by changing to the process work dir and entering the command `bash .command.run`

Even usage of \ didn't help.

nextflow alihnment align bwa • 2.2k views
ADD COMMENT
3
Entering edit mode
20 months ago

[E::bwa_idx_load_from_disk] fail to locate the index files

use a full path for params.ref : /full/path/to/results/00_indexes/bwaidx

ADD COMMENT
0
Entering edit mode

Also why do you think that $id isn't getting substituted by anything in read group?

ADD REPLY
0
Entering edit mode

Thanks. Adding $baseDir worked. Here is the final code:

nextflow.enable.dsl=2
params.raw = "data/*{1,2}.fastq.gz"
params.ref = "$baseDir/results/00_indexes/bwaidx/sequence.fasta"
params.outdir="results/04_alignments"

process BWAMEM{
    publishDir "$params.outdir/bwa.alignments"
    input:
    tuple val(sample_id), path(reads)
    output:
    path "${sample_id}.sorted.bam"
    shell:
    '''
    id=$(zcat !{reads[0]} | head -n 1 | cut -f 3-4 -d":" | sed 's/@//')
    echo "$id"
    bwa mem -M -R "$(echo "@RG\\tID:${id}\\tSM:!{sample_id}\\tPL:ILLUMINA")" -t 8 !{params.ref} !{reads[0]} !{reads[1]} | samtools sort -@8 -o !{sample_id}.sorted.bam -
    '''
}

reads_ch = Channel.fromFilePairs(params.raw, checkIfExists: true )

workflow {
  BWAMEM(reads_ch)
}
ADD REPLY
0
Entering edit mode
20 months ago

If I were you, I'd write the complicated RG stuff in a separate script section of your process.

That way, you're only dealing with the complexities of groovy variables, rather than groovy and bash vars (which I find confusing).

You can then just println $yourVar to print without the complexities of the workflow.

example script section:

input:
file bam
file bai


output:
file "plots"
path "*.calmd_cov_window.txt", emit: window_txt


script:
prefix = bam.name.toString().tokenize('.').get(0)
name = bam

"""
cp -R ${params.WOCHENENDE_DIR}/plots/ .
cp -R ${params.WOCHENENDE_DIR}/scripts/ .
cp scripts/*.sh .
bash runbatch_sambamba_depth.sh


"""
ADD COMMENT
0
Entering edit mode

Thanks for the suggestion colindaven but I am newbie to nextflow. I have yet to learn what does .tokenize and emit does. I will revisit your solution, when I learn advance stuff.

ADD REPLY

Login before adding your answer.

Traffic: 1926 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6