Question: Why DESeq2 in parallel mode is slower than normal?!
0
gravatar for ilikebing2000
2.5 years ago by
ilikebing200050 wrote:

Hi everyone, I have 417 samples from 4 groups, each sample contains the expression of 500 genes, (My data is a 500x417 matrix) and I want to do Differential Expression Analysis on it.

When I run DESeq in normal mode (parallel=FALSE), it takes ~137 seconds to finish;

& When I run DESeq in parallel mode (parallel=TRUE), and I register(SnowParam()) with 28 workers using BiocParallel, it takes ~406 seconds to finish;

& When I run DESeq in parallel mode (parallel=TRUE), and I register(MulticoreParam()) with 28 workers using BiocParallel, it takes ~405 seconds to finish.

Why DESeq is slower in parallel mode?

ADD COMMENTlink modified 2.5 years ago by Michael Love1.9k • written 2.5 years ago by ilikebing200050
2
gravatar for Michael Love
2.5 years ago by
Michael Love1.9k
United States
Michael Love1.9k wrote:

Can you test to see that your parallel setup is ok? For example:

 > register(SerialParam())
 > system.time({ bplapply(1:4, function(i) Sys.sleep(5)) })
    user  system elapsed
   0.016   0.004  20.020
 > register(MulticoreParam(workers=4))
!> system.time({ bplapply(1:4, function(i) Sys.sleep(5)) })
    user  system elapsed
   0.010   0.017   6.203
ADD COMMENTlink written 2.5 years ago by Michael Love1.9k
register(SerialParam())

system.time({ bplapply(**1:4**, function(i) Sys.sleep(5)) })

user  system elapsed 

0.076   0.060  **20.031** 

----
register(MulticoreParam(workers=**4**))

system.time({ bplapply(**1:4**, function(i) Sys.sleep(5)) })

user  system elapsed 

 0.176   0.552   **9.608** 

----
register(SerialParam())

system.time({ bplapply(**1:28**, function(i) Sys.sleep(5)) })

user  system elapsed 

 0.568   0.352 **140.068** 

----
register(MulticoreParam(workers=**28**))

system.time({ bplapply(**1:28**, function(i) Sys.sleep(5)) })

  user  system elapsed 

0.316   3.784  **17.433** 

----

Not sure, Is it ok?

ADD REPLYlink modified 2.5 years ago by genomax75k • written 2.5 years ago by ilikebing200050

So the overhead of simply calling 28 workers keeps you away from achieving a speedup of 28, instead you get a speedup of 8 for the toy example of sleeping for five seconds. This might be ameliorated as the task time increases, but with real data you also have to split up the data and send to each worker. I'd try DESeq2 with smaller number of workers, and maybe if you are working with a cluster you can make sure that cores are on the same node. The details of the backend make a difference.

ADD REPLYlink written 2.5 years ago by Michael Love1.9k

Thanks for you help.

ADD REPLYlink modified 2.5 years ago • written 2.5 years ago by ilikebing200050
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1698 users visited in the last hour