Nextflow: how to pass Bam and Bam index as Input Channel?
2
0
Entering edit mode
10 months ago
Eliveri ▴ 350

I would like to pass in bam files pair_id.sorted.bam and their corresponding index files pair_id.sorted.bam.csi into a nextflow workflow. However I am having trouble passing in the files, with errors being thrown for def indexFile = new File("${it.getPath()}.bai").

 Not a valid path value type: java.io.File ({it.getPath()}.bai)

Alternatively, would Channel.fromFilePairs be appropriate, or is this operator only for fastq file pairs?

nextflow.config:

input = "${directory}/*/*.sorted{.bam,.bam.csi}"

workflow.nf:

process my_process {

    input:
    tuple val(pair_id), path(pf_bam), path(pf_bam_index)

 ...    
 }

workflow {
    Channel
        .fromPath(params.input, checkIfExists: true)
        .filter{it.name.endsWith('.bam')}
        .map{
            def indexFile = new File("${it.getPath()}.bai")
            tuple(it.name.split('.sorted')[0], it, indexFile)
        }
        .set{input_ch}


    my_process(input_ch)

}
nextflow bam • 909 views
ADD COMMENT
2
Entering edit mode
10 months ago

When I wrote something with cram and crai I just used two channels, which seems simpler and worked for me (this is NF DSL2).

eg in workflow section

// File inputs
input_cram = Channel.fromPath(params.cram, checkIfExists: true)
input_crai = Channel.fromPath(params.crai, checkIfExists: true)


// longshot, run split by chromosome - from cram
longshot(input_cram, input_crai, fai_channel.flatten())

And the process longshot

/*
 *  Call SNVs from long read CRAM
 */

process longshot {

// Uncomment to use ram disk. Might be a tick faster and or use less network traffic, effect still largely unproven.
//scratch 'ram-disk'

cpus = 4
// If job fails, try again with more memory
memory { 40.GB * task.attempt }
errorStrategy 'retry'

tag "$name + $region"
label 'process_medium'

// Do not save vcf.gz tmp output 

input:
file cram
file crai
each region

output:
file vcf_reheader_gz

script:
prefix = cram.toString().tokenize('.').get(0)
name = cram
vcf =  prefix + "_" + region + ".vcf"
vcf_gz = vcf + ".gz"
vcf_reheader_gz = vcf + "_reheader.gz"

"""
longshot --region ${ region } --bam $cram --ref $params.fasta  --min_cov $params.min_cov --min_alt_count $params.min_alt_count --min_alt_frac $params.min_alt_frac --sample_id ${prefix} --out ${vcf} > ${prefix}${region + "_out.txt"} 2> ${prefix}${region + "_err.txt"}

bgzip -f -k -@ $task.cpus ${vcf}
tabix ${vcf + ".gz"}

bcftools reheader -f ${params.fai} -o $vcf_reheader_gz   ${vcf + ".gz"}

"""    
}
ADD COMMENT
1
Entering edit mode
10 months ago

Instead of "new File("${it.getPath()}.bai")" can you please try just 'file("${it.getPath()}.bai")'

ADD COMMENT

Login before adding your answer.

Traffic: 2435 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6