Question: RNA-Seq: using GLMM to detect differentially expressed genes
gravatar for alesssia
4.9 years ago by
London, UK
alesssia560 wrote:

Hi All.

I have a set of raw count data and I am interested in using (G)LMM to detect differentially expressed genes. However, I have a number of questions about how to prepare the better (correct?) pipeline for this task.

1. I am aware that using linear models (instead of well-know tools, such as DESeq2) will give me less power -- unless I have a large set of samples. I know that this is a dumb question, but which number of samples can be called "large"?   

2. To have meaningful results I believe that a filtering and a normalisation step are needed beforehand. Is this assumption correct? Which is a reliable approach to filter/normalise my data? 

3. May it be useful to work with transformed versions of the count data?

4. I usually use LMMs (lme4 R package) when looking for differentially expressed genes in the context of microarray data -- I work with multiplex family data and I want to correct for samples' relatedness. However, when RNA-Seq counts are at hand, is it better to use zero-inflated Poisson models? Or can I assume that there is only an overdispersion problem? Can the answer to this question be data-dependent?

Thanks in advance for your help,


ADD COMMENTlink modified 4.9 years ago by Devon Ryan93k • written 4.9 years ago by alesssia560
gravatar for Devon Ryan
4.9 years ago by
Devon Ryan93k
Freiburg, Germany
Devon Ryan93k wrote:
  1. I suspect that Gordon Smyth has given a recommendation on this somewhere, though I haven't ever come across it. My gut would say you should a hundred of samples or so, but that should be taken with a large grain of salt without empirical data. I should note that you'll always have lower power without sharing information across genes, it's just a question of how much you've lost. Of course, the more complicated the model, the more samples you'd really need to have.
  2. Normalization yes, filtering no. Well, filtering other than just removing rows with 0 counts (or otherwise will break the (G)LMM function) isn't necessary. You'll need to perform a library-size normalization. The most straight-forward way to do this is to first use DESeq or DESeq2 and get the resulting sizeFactor(). This can then be used as weight in your glmm. You can perform independent filtering after the fact once you have raw p-values. The genefilter package is convenient for this.
  3. Possible. If you run everything through limma::voom() first, then you'd have data in a nice format for a more traditional LMM.
  4. I've not seen much of any gain from zero-inflated based models over "simple" negative binomial models. There are a couple papers out there comparing negative-binomial, zero-inflated negative-binomial, and zero-inflated poisson models if you want some hard numbers on this.
ADD COMMENTlink written 4.9 years ago by Devon Ryan93k

Thank you very much Devon: you answers are very helpful!

ADD REPLYlink written 4.9 years ago by alesssia560
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 980 users visited in the last hour