Question: RNA-seq gene expression analysis using 0-counts
gravatar for johntlovell
4.8 years ago by
United States
johntlovell10 wrote:

Hi Folks.

I am conducting a differential gene expression analysis using RNA-seq. My experimental design is blocked and repeated, so I need to fit mixed effects models and cannot make use of standard DGE packages such as DESeq, edgeR etc. This is not a problem when the count data is generalizable to the negative biominal (poisson etc.) distribution; however, for many of the genes, I have highly 0-inflated, or binary distributed count data. For example, for many of the genes, there are 0 counts for one parent and >5 counts for the other parent. Please advise on the best way to analyze genes that behave this way. 

Thanks, John

rna-seq • 2.3k views
ADD COMMENTlink modified 4.6 years ago by Biostar ♦♦ 20 • written 4.8 years ago by johntlovell10
  1. Are you sure you actually need to use a mixed-effect model? Given that DESeq2/edgeR/etc. use shrinkage, a mixed-effect model is unlikely to benefit you.
  2. Have a look at limma's duplicateCorrelation() function.
ADD REPLYlink written 4.8 years ago by Devon Ryan91k

Thanks Devon. This comment has come up in many of the posts that I have read. 

For me, when an experiment is designed with blocking and replication within the individual, the individual and experimental blocking must be analyzed as random effects. This is a pretty standard quantitative genetics design. Furthermore, we have a ton of replication within the experimental factors we are testing among, so I am not convinced that shrinkage is a particularly good method to estimate within group variances. 

Anyways, even if I did use fixed effects, I am still unsure about the best way to analyze these highly 0-inflated and binary gene expression phenotypes. Thanks again.

ADD REPLYlink written 4.8 years ago by johntlovell10

Certainly if you were to compare a straight GLM and a GLMM on your dataset then the GLMM would work better...but of course a GLMM is just doing shrinkage in a different way than DESeq2 et al., which aren't straight GLMs.

Regarding the zeroes, it depends a bit on exactly what you mean by zero inflated and where the problem is. If the case is that you have absolutely 0 expression in all but one sample, then that can be problematic. I suppose how to deal with that depends on whether you find those cases biologically interesting. For most people they wouldn't be, but I can think of counter examples (e.g., single-cell sequencing).

ADD REPLYlink written 4.8 years ago by Devon Ryan91k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 665 users visited in the last hour