Question

Nextflow: how to pass Bam and Bam index as Input Channel?

0

Entering edit mode

10 months ago

Eliveri ▴ 350

I would like to pass in bam files pair_id.sorted.bam and their corresponding index files pair_id.sorted.bam.csi into a nextflow workflow. However I am having trouble passing in the files, with errors being thrown for def indexFile = new File("${it.getPath()}.bai").

 Not a valid path value type: java.io.File ({it.getPath()}.bai)

Alternatively, would Channel.fromFilePairs be appropriate, or is this operator only for fastq file pairs?

nextflow.config:

input = "${directory}/*/*.sorted{.bam,.bam.csi}"

workflow.nf:

process my_process {

    input:
    tuple val(pair_id), path(pf_bam), path(pf_bam_index)

 ...    
 }

workflow {
    Channel
        .fromPath(params.input, checkIfExists: true)
        .filter{it.name.endsWith('.bam')}
        .map{
            def indexFile = new File("${it.getPath()}.bai")
            tuple(it.name.split('.sorted')[0], it, indexFile)
        }
        .set{input_ch}


    my_process(input_ch)

}

nextflow bam • 909 views

ADD COMMENT • link updated 10 months ago by Pierre Lindenbaum 161k • written 10 months ago by Eliveri ▴ 350

score 2 · Answer 1 · 2023-07-05

When I wrote something with cram and crai I just used two channels, which seems simpler and worked for me (this is NF DSL2).

eg in workflow section

// File inputs
input_cram = Channel.fromPath(params.cram, checkIfExists: true)
input_crai = Channel.fromPath(params.crai, checkIfExists: true)


// longshot, run split by chromosome - from cram
longshot(input_cram, input_crai, fai_channel.flatten())

And the process longshot

/*
 *  Call SNVs from long read CRAM
 */

process longshot {

// Uncomment to use ram disk. Might be a tick faster and or use less network traffic, effect still largely unproven.
//scratch 'ram-disk'

cpus = 4
// If job fails, try again with more memory
memory { 40.GB * task.attempt }
errorStrategy 'retry'

tag "$name + $region"
label 'process_medium'

// Do not save vcf.gz tmp output 

input:
file cram
file crai
each region

output:
file vcf_reheader_gz

script:
prefix = cram.toString().tokenize('.').get(0)
name = cram
vcf =  prefix + "_" + region + ".vcf"
vcf_gz = vcf + ".gz"
vcf_reheader_gz = vcf + "_reheader.gz"

"""
longshot --region ${ region } --bam $cram --ref $params.fasta  --min_cov $params.min_cov --min_alt_count $params.min_alt_count --min_alt_frac $params.min_alt_frac --sample_id ${prefix} --out ${vcf} > ${prefix}${region + "_out.txt"} 2> ${prefix}${region + "_err.txt"}

bgzip -f -k -@ $task.cpus ${vcf}
tabix ${vcf + ".gz"}

bcftools reheader -f ${params.fai} -o $vcf_reheader_gz   ${vcf + ".gz"}

"""    
}

score 1 · Answer 2 · 2023-07-05

1

Entering edit mode

10 months ago

Pierre Lindenbaum 161k

Instead of "new File("${it.getPath()}.bai")" can you please try just 'file("${it.getPath()}.bai")'

ADD COMMENT • link 10 months ago by Pierre Lindenbaum 161k