Run Nextflow Process in For Loop from Chromosomes 1 through N?
2
1
Entering edit mode
10 months ago
Eliveri ▴ 350

My nextflow workflow uses a for loop in the scrip to run the same process for chromosomes 1 through n. Is there a more efficient/nextflow-like way to run this process which would also allow the output to be organized in directories by chromosome rather than id name?

process my_process {

    // But want output to somehow be in folder by chromosome rather than id?
publishDir "${outdir}/$id 

    input:
    tuple val(...), path(id)

    output:
    tuple val(id), path("${id}.chr{1,2,3,...n}.g.vcf"), 
    path("${id}.chr{1,2,3,...n}.g.vcf.idx")  

    script:
    """        
    for i in 1 2 3 ...n 
        do
            gatk --java-options -I ...
        done
    """ 
}

workflow {
    Channel
        .fromPath(params.input, checkIfExists: true)
        .map {tuple( it.name.split('.sorted')[0], it )}
        .set{input_ch}

    my_process(input_ch) 

}
nextflow • 1.0k views
ADD COMMENT
2
Entering edit mode
10 months ago

As you running GATK, you must provide a reference, hence there must be a fai file associated. You can just grab the first column of the fai file that contains the chromosomes.

each_contig = Channel.fromPath("${params.fasta}.fai").splitCsv(header:false,sep:'\t').map{T->T[0]}

and then combine this with something else, eg, a some bams:

 my_process(each_bam.combine(each_contig))

so you 'll parallelize by chromosome

ADD COMMENT
1
Entering edit mode
10 months ago

Writing a for loop within a process will not allow Nextflow to send jobs out in parallel to your machine/cluster, but just run it linearly. This is not what you want.

Have a look at this https://groups.google.com/g/nextflow/c/vikd30XKK8Q where someone solved the problem.you have already, though it will surely need a bit of adjustment to the modern GATK parameters.

ADD COMMENT

Login before adding your answer.

Traffic: 1777 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6