Running DESeq2 parallelized with pre-computed dispersions
0
0
Entering edit mode
2.6 years ago

Hello,

I am running DESeq2 on thousands of large datasets which have many overlapping samples (shared controls). As such, I would like to estimate dispersions once, and use those precomputed dispersions instead of recomputing for each dataset to save time. (I realize this may lead to some suboptimal dispersion estimates but am willing to accept that error).

I realize that I could add these precomputed dispersions to an existing dataset and call nbinomLRT to run the negative binomial GLM with the precomputed dispersions. However, because of the size of the dataset, I would also like to parallelize this, and based on this post, it seems that lower level functions like nbinomLRT are not themselves parallelized.

My question is: is there a way to run the DESeq function with pre-computed dispersion estimates? If I to naively implement this, I get the output "found already estimated dispersions, replacing these".

Thanks, and apologies if this has been covered elsewhere in the forum/docs.

deseq2 • 783 views
ADD COMMENT
0
Entering edit mode

If datasets are large why not running a linear and faster approach such as limma-trend? Shared controls, does that make sense, are you aware of batch effects?

ADD REPLY
0
Entering edit mode

Hi ATpoint--thanks for your message!

As far as limma-trend, I know that limma generally identifies overlapping DEGs with DESeq2, but for my project, I am very interested in having precise estimates of LFC and LFC_se for all genes, including lowly expressed ones. To that end, I am concerned about the distributional assumptions of limma, versus the counts based NB model in DESeq2. That being said, admittedly I've tested limma less and may be wrong to be worried here.

I believe that shared controls makes sense in my case. The design is to have hundreds of batches, and each batch contains some number of control samples, as well as hundreds of intermingled samples with different perturbations. Therefore, even though I am sharing controls, I am still able to correct for batch differences. Actually, I believe it is correcting for this large number of batches that is causing the large runtime.

ADD REPLY

Login before adding your answer.

Traffic: 4618 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6