Question

Problem when having multiple Tophat runs inside one workflow in Galaxy

0

Entering edit mode

7.8 years ago

tunl ▴ 80

I created a workflow in Galaxy which contains two Tophat runs for two fastq reads, and the two output BAM files from the two Tophat runs are the inputs to Cuffdiff.

When I have only one Tophat run in the workflow, each of the two fastq reads runs successfully with Tophat. However, when I have two Tophat runs for these two fastq reads inside one workflow, I get an out-of-memory problem at bowtie2-align (at the step “Mapping left_kept_reads.m2g_um to genome mm10 with Bowtie2”), as follows:

Error running bowtie:
Out of memory allocating the offs[] array  for the Bowtie index.
Please try again on a computer with more memory.
Error: Encountered internal Bowtie 2 exception (#1)
(ERR): bowtie2-align exited with value

Although there are two Tophat runs inside one workflow, I suppose the two Tophat runs should be executed sequentially, as they appear as sequential steps on the run form and run history.

So I don’t understand why there is no out-of-memory problem when there is only one Tophat run in the workflow, but there is an out-of-memory problem when there are two Tophat runs inside one workflow.

Does Galaxy actually run the two Tophat in parallel although they are shown as sequential steps in the run history?

If this is the case, what could I do to make sure that the two Tophat are indeed executed sequentially so that there isn't any out-of-memory problem?

I’d greatly appreciate any ideas and suggestions.

Thank you very much!

RNA-seq Galaxy Tophat • 2.1k views

ADD COMMENT • link 7.8 years ago by tunl ▴ 80

1

Entering edit mode

Could one of the runs possibly be crashing and hanging? It sounds like some kind of memory leak...

ADD REPLY • link 7.8 years ago by Brian Bushnell 20k

1

Entering edit mode

Is this running on public galaxy at PSU? It is possible that galaxy is allocating a certain amount of RAM to your workflow and after the first job completes that memory is not being cleanly recovered for the second. You may also be able to specify that the second tophat run not start until the first one completes. Just thinking out aloud.

In any case posting this over at Galaxy biostars may be more appropriate.

ADD REPLY • link 7.8 years ago by GenoMax 141k

0

Entering edit mode

Thank you very much for the suggestion!

It is actually running on a local Galaxy.

What’s weird is that the out-of-memory problem occurs in the first Tophat run, not in the 2nd. The 2nd Tophat run actually succeeded after the first run failed due to insufficient memory. And also when I run them separately, each Tophat run succeeded.

So this makes me suspect that Galaxy is not truly running the two Tophat sequentially.

I’m wondering if there may be a way to specify that the second Tophat run does not start until the first one completes?

Thank you very much!

ADD REPLY • link 7.8 years ago by tunl ▴ 80