Question: Why DESeq2 in parallel mode is slower than normal?!
0
gravatar for ilikebing2000
22 months ago by
ilikebing200050 wrote:

Hi everyone, I have 417 samples from 4 groups, each sample contains the expression of 500 genes, (My data is a 500x417 matrix) and I want to do Differential Expression Analysis on it.

When I run DESeq in normal mode (parallel=FALSE), it takes ~137 seconds to finish;

& When I run DESeq in parallel mode (parallel=TRUE), and I register(SnowParam()) with 28 workers using BiocParallel, it takes ~406 seconds to finish;

& When I run DESeq in parallel mode (parallel=TRUE), and I register(MulticoreParam()) with 28 workers using BiocParallel, it takes ~405 seconds to finish.

Why DESeq is slower in parallel mode?

ADD COMMENTlink modified 22 months ago by Michael Love1.8k • written 22 months ago by ilikebing200050
2
gravatar for Michael Love
22 months ago by
Michael Love1.8k
United States
Michael Love1.8k wrote:

Can you test to see that your parallel setup is ok? For example:

 > register(SerialParam())
 > system.time({ bplapply(1:4, function(i) Sys.sleep(5)) })
    user  system elapsed
   0.016   0.004  20.020
 > register(MulticoreParam(workers=4))
!> system.time({ bplapply(1:4, function(i) Sys.sleep(5)) })
    user  system elapsed
   0.010   0.017   6.203
ADD COMMENTlink written 22 months ago by Michael Love1.8k
register(SerialParam())

system.time({ bplapply(**1:4**, function(i) Sys.sleep(5)) })

user  system elapsed 

0.076   0.060  **20.031** 

----
register(MulticoreParam(workers=**4**))

system.time({ bplapply(**1:4**, function(i) Sys.sleep(5)) })

user  system elapsed 

 0.176   0.552   **9.608** 

----
register(SerialParam())

system.time({ bplapply(**1:28**, function(i) Sys.sleep(5)) })

user  system elapsed 

 0.568   0.352 **140.068** 

----
register(MulticoreParam(workers=**28**))

system.time({ bplapply(**1:28**, function(i) Sys.sleep(5)) })

  user  system elapsed 

0.316   3.784  **17.433** 

----

Not sure, Is it ok?

ADD REPLYlink modified 22 months ago by genomax64k • written 22 months ago by ilikebing200050

So the overhead of simply calling 28 workers keeps you away from achieving a speedup of 28, instead you get a speedup of 8 for the toy example of sleeping for five seconds. This might be ameliorated as the task time increases, but with real data you also have to split up the data and send to each worker. I'd try DESeq2 with smaller number of workers, and maybe if you are working with a cluster you can make sure that cores are on the same node. The details of the backend make a difference.

ADD REPLYlink written 22 months ago by Michael Love1.8k

Thanks for you help.

ADD REPLYlink modified 22 months ago • written 22 months ago by ilikebing200050
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1407 users visited in the last hour