Question: How to perform iteration task inside a docker image (RNA-seq analysis)
gravatar for dodausp
4 months ago by
dodausp140 wrote:

Hi, there

I am using docker containers to run a pipeline for RNA-seq analysis. Specifically, I am using cufflinks now, to quantitate and annotate the transcripts. I am dealing with 770 .bam files (aligned by TopHat), and running the following code for each one of them:

cufflinks -b hg19.fa -g hg19.refGene.gtf -u [sample.bam]

I would very much like to let cufflinks run across all those samples without the need of manually putting the command for each one of them. And that would also allow for the machine to run non-stop until it's done. So, my question is:

Is there any way to create an iteration procedure/command for that?

PS: I am running the docker image within an interactive shell (arguments -it)

Any help is very much appreciated!

rna-seq docker quantification • 137 views
ADD COMMENTlink modified 4 months ago by Barry Digby380 • written 4 months ago by dodausp140
gravatar for Barry Digby
4 months ago by
Barry Digby380
National University of Ireland, Galway
Barry Digby380 wrote:

I would highly recommend using nextflow or some other workflow manager as they are designed precisely for the task you are describing.

You can write a simple script to read in the bam files, and execute the cufflinks command. Using the -with-docker flag will tell it to run it inside the container where your tools are.

Something like:

#!/usr/bin nextflow

params.bams = "/path/to/bams/*.bam"
     .frompath( params.bams )
     .set{ bams }

params.genome = "/path/to/hg19.fa"
genome_file = file(params.genome)

params.gtf = "path/to/hg19.gtf"
gtf_file = file(params.gtf)

process cufflinks{
     publishDir "Results", mode:'copy'

    file bam from bams
    file genome from genome_file
    file gtf from gtf_file

    file "*.gtf" into outputs

    cufflinks -b $genome -g $gtf -u $bam

Should be more than enough to get you up and running there. Nextflow is easy to install and you will not run into sudo problems (assuming you are on a HPC with 700+ bams).

ADD COMMENTlink modified 4 months ago • written 4 months ago by Barry Digby380

Thanks a lot, @Barry Digby! That's the first time I heard about nextflow. I will definitely give it a go and post a follow-up here. Many thanks again!

ADD REPLYlink modified 4 months ago • written 4 months ago by dodausp140

No problem. Let me know if you run into trouble :)

ADD REPLYlink written 4 months ago by Barry Digby380
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 988 users visited in the last hour