Question

Nextflow bwa-mem2 pipeline/channel question

1

Entering edit mode

5 months ago

megaphonecat ▴ 10

Hello,

I'm trying to write a simple bwa script for my research, but it keeps failing. I have two questions in this post.

The first code can be executed, but only the first paired fastq files are processed. Google search tells me it's because the index_ch is a queue channel instead of a value channel, but I'm unsure if it's the reason and how to tackle it.

nextflow.enable.dsl=2

params.genome = "./data/genome.fa"
params.reads = "./data/reads/*_{1,2}.fastq"
params.map = "map.csv"

process genome_index{
    conda "bwa-mem2"

    input:
    file(genome)

    output:
    file "genome.*"

    script:
    """
    bwa-mem2 index ${genome}
    """
}

process alignment{
    conda "bwa-mem2 samtools"

    input:
    tuple val(meta), path(reads)
    file(genome)
    file(index)

    output:
    tuple val(meta), path("*.bam"), emit: bam

    script:
    prefix = task.ext.prefix ?: "${meta.id}"

    """
    bwa-mem2 mem\\
        -t $task.cpus \\
        $genome \\
        $reads \\
        | samtools sort --threads $task.cpus -o ${prefix}.bam -
    """
}

workflow {
Channel
        .fromPath(params.map)
        .splitCsv(header:true)
        .map{row ->
            metaMap =[id:row.id, single:row.single_end]
            [metaMap, [file(row.fastq1), file(row.fastq2)]]
        }
        .set{file_ch}

Channel
        .fromPath(params.genome)
        .set{genome_ch}
index_ch = genome_index(genome_ch)
bwa_ch = alignment(file_ch,genome_ch,index_ch)

To overcome the previous problem. I put all the indexed files, including "genome.fa.bwt.2bit.64", into a new folder (g_index) and created its own channel. However, still, only the first pair of files were processing, and the following error showed up:

... ... Command executed:

bwa-mem2 mem\ -t 1 \ genome.fa\ read_1.fastq read_2.fastq \ | samtools sort --threads 1 -o read.bam -

Command exit status: 1 ... ... ERROR! Unable to open the file: genome.fa.bwt.2bit.64
[W::hts_set_opt] Cannot change block size for this format
samtools sort: failed to read header from "-"

Here is the second code:

params.reads = "./data/reads/*_{1,2}.fastq"
params.genome = "./data/genome.fa"
params.index = "./data/g_index/*"

process alignment{
    conda "bwa-mem2 samtools"

    input:
    tuple val(meta), path(reads)
    file(genome)
    file(index)

    output:
    tuple val(meta), path("*.bam"), emit: bam

    script:
    prefix = task.ext.prefix ?: "${meta.id}"

    """
    bwa-mem2 mem\\
        -t $task.cpus \\
        $genome \\
        $reads \\
        | samtools sort --threads $task.cpus -o ${prefix}.bam -
    """
}

Channel
        .fromPath(params.map)
        .splitCsv(header:true)
        .map{row ->
            metaMap =[id:row.id, single:row.single_end]
            [metaMap, [file(row.fastq1), file(row.fastq2)]]
        }
        .set{file_ch}
Channel
        .fromPath(params.genome)
        .set{genome_ch}
Channel
        .fromPath(params.index)
        .set{index_ch}
bwa_ch = alignment(file_ch,genome_ch,index_ch)

I've followed several basic tutorials and nf-core pipelines, but it's been difficult when writing my own script and debugging it. I appreciate any help in fixing the code and improving my coding skills on Nextflow.

Cheers!

bwa-mem2 bwa nextflow • 1.2k views

ADD COMMENT • link 5 months ago by megaphonecat ▴ 10

0

Entering edit mode

file "genome.*"

only works if the fasta is "genome.xxx"

ADD REPLY • link 5 months ago by Pierre Lindenbaum 161k

0

Entering edit mode

Try each for the first issue. https://www.nextflow.io/docs/latest/process.html#input-repeaters-each.

ADD REPLY • link 5 months ago by jkim ▴ 170

score 0 · Answer 1 · 2023-11-09

not tested, for bwa index you could create an empty file with touch to generate the basename of the indexes. Something like:

process genome_index{

    input:
        path(genome)

    output:
        path "out/${genome.getBaseName()}"

    script:
    """
    mkdir -p out
    bwa-mem2 index -p "out/${genome.getBaseName()}"  ${genome}
    touch "out/${genome.getBaseName()}"
    """
}

and then align:

    process alignment{
    conda "bwa-mem2 samtools"
    input:
         path(bwaindex)
         tuple val(sample), path(R1),path(R2)


    output:
    tuple val(sample), path("${sample}.bam"), emit: bam

    script:
    """
    bwa-mem2 mem\\
        ${bwaindex.toRealPath()} \\
        ${R1} ${R2}   | samtools sort -T TMP  -o ${sample}.bam -
    """
}

score 0 · Answer 2 · 2023-11-09

0

Entering edit mode

5 months ago

colindaven 6.4k

I've dealt with multi-file indices from vg in Nextflow before by just specifying each file which is produced by the index as outputs. Write these as separate outputs from the index process, and input for the alignment process. That worked very well.

Make sure you keep the order of inputs (in the process call in the workflow section) the same as the order of inputs in your alignment process.

If you have problems, then check the work directory which produced the error and see what has really happened there to debug.

ADD COMMENT • link 5 months ago by colindaven 6.4k

0

Entering edit mode

My first code produced index files correctly, and it seems like the index_ch is a value channel with a single value input.

The paired reads input worked fine in my trimmomatic script (all paired reads were processed) but failed in the bwa alignment.

ADD REPLY • link 5 months ago by megaphonecat ▴ 10