Nextflow bwa-mem2 pipeline/channel question
2
1
Entering edit mode
5 months ago
megaphonecat ▴ 10

Hello,

I'm trying to write a simple bwa script for my research, but it keeps failing. I have two questions in this post.

The first code can be executed, but only the first paired fastq files are processed. Google search tells me it's because the index_ch is a queue channel instead of a value channel, but I'm unsure if it's the reason and how to tackle it.

nextflow.enable.dsl=2

params.genome = "./data/genome.fa"
params.reads = "./data/reads/*_{1,2}.fastq"
params.map = "map.csv"

process genome_index{
    conda "bwa-mem2"

    input:
    file(genome)

    output:
    file "genome.*"

    script:
    """
    bwa-mem2 index ${genome}
    """
}

process alignment{
    conda "bwa-mem2 samtools"

    input:
    tuple val(meta), path(reads)
    file(genome)
    file(index)

    output:
    tuple val(meta), path("*.bam"), emit: bam

    script:
    prefix = task.ext.prefix ?: "${meta.id}"

    """
    bwa-mem2 mem\\
        -t $task.cpus \\
        $genome \\
        $reads \\
        | samtools sort --threads $task.cpus -o ${prefix}.bam -
    """
}

workflow {
Channel
        .fromPath(params.map)
        .splitCsv(header:true)
        .map{row ->
            metaMap =[id:row.id, single:row.single_end]
            [metaMap, [file(row.fastq1), file(row.fastq2)]]
        }
        .set{file_ch}

Channel
        .fromPath(params.genome)
        .set{genome_ch}
index_ch = genome_index(genome_ch)
bwa_ch = alignment(file_ch,genome_ch,index_ch)

To overcome the previous problem. I put all the indexed files, including "genome.fa.bwt.2bit.64", into a new folder (g_index) and created its own channel. However, still, only the first pair of files were processing, and the following error showed up:

... ... Command executed:

bwa-mem2 mem\ -t 1 \ genome.fa\ read_1.fastq read_2.fastq \ | samtools sort --threads 1 -o read.bam -

Command exit status: 1 ... ... ERROR! Unable to open the file: genome.fa.bwt.2bit.64
[W::hts_set_opt] Cannot change block size for this format
samtools sort: failed to read header from "-"

Here is the second code:

params.reads = "./data/reads/*_{1,2}.fastq"
params.genome = "./data/genome.fa"
params.index = "./data/g_index/*"

process alignment{
    conda "bwa-mem2 samtools"

    input:
    tuple val(meta), path(reads)
    file(genome)
    file(index)

    output:
    tuple val(meta), path("*.bam"), emit: bam

    script:
    prefix = task.ext.prefix ?: "${meta.id}"

    """
    bwa-mem2 mem\\
        -t $task.cpus \\
        $genome \\
        $reads \\
        | samtools sort --threads $task.cpus -o ${prefix}.bam -
    """
}

Channel
        .fromPath(params.map)
        .splitCsv(header:true)
        .map{row ->
            metaMap =[id:row.id, single:row.single_end]
            [metaMap, [file(row.fastq1), file(row.fastq2)]]
        }
        .set{file_ch}
Channel
        .fromPath(params.genome)
        .set{genome_ch}
Channel
        .fromPath(params.index)
        .set{index_ch}
bwa_ch = alignment(file_ch,genome_ch,index_ch)

I've followed several basic tutorials and nf-core pipelines, but it's been difficult when writing my own script and debugging it. I appreciate any help in fixing the code and improving my coding skills on Nextflow.

Cheers!

bwa-mem2 bwa nextflow • 1.2k views
ADD COMMENT
0
Entering edit mode
file "genome.*"

only works if the fasta is "genome.xxx"

ADD REPLY
0
Entering edit mode
ADD REPLY
0
Entering edit mode
5 months ago

not tested, for bwa index you could create an empty file with touch to generate the basename of the indexes. Something like:

process genome_index{

    input:
        path(genome)

    output:
        path "out/${genome.getBaseName()}"

    script:
    """
    mkdir -p out
    bwa-mem2 index -p "out/${genome.getBaseName()}"  ${genome}
    touch "out/${genome.getBaseName()}"
    """
}

and then align:

    process alignment{
    conda "bwa-mem2 samtools"
    input:
         path(bwaindex)
         tuple val(sample), path(R1),path(R2)


    output:
    tuple val(sample), path("${sample}.bam"), emit: bam

    script:
    """
    bwa-mem2 mem\\
        ${bwaindex.toRealPath()} \\
        ${R1} ${R2}   | samtools sort -T TMP  -o ${sample}.bam -
    """
}
ADD COMMENT
0
Entering edit mode

Thank you for the response. This method is similar to my second approach, where all the indexed files and the reference genome are in one folder and are called directly. But still, the same error message showed up:

ERROR! Unable to open the file: genome.fa.bwt.2bit.64

ADD REPLY
0
Entering edit mode
5 months ago

I've dealt with multi-file indices from vg in Nextflow before by just specifying each file which is produced by the index as outputs. Write these as separate outputs from the index process, and input for the alignment process. That worked very well.

Make sure you keep the order of inputs (in the process call in the workflow section) the same as the order of inputs in your alignment process.

If you have problems, then check the work directory which produced the error and see what has really happened there to debug.

ADD COMMENT
0
Entering edit mode

My first code produced index files correctly, and it seems like the index_ch is a value channel with a single value input.

The paired reads input worked fine in my trimmomatic script (all paired reads were processed) but failed in the bwa alignment.

ADD REPLY

Login before adding your answer.

Traffic: 2596 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6