RNA-seq gene expression analysis using 0-counts
0
1
Entering edit mode
9.4 years ago
johntlovell ▴ 10

Hi Folks.

I am conducting a differential gene expression analysis using RNA-seq. My experimental design is blocked and repeated, so I need to fit mixed effects models and cannot make use of standard DGE packages such as DESeq, edgeR etc. This is not a problem when the count data is generalizable to the negative biominal (poisson etc.) distribution; however, for many of the genes, I have highly 0-inflated, or binary distributed count data. For example, for many of the genes, there are 0 counts for one parent and >5 counts for the other parent. Please advise on the best way to analyze genes that behave this way.

Thanks, John

RNA-Seq • 3.3k views
ADD COMMENT
0
Entering edit mode
  1. Are you sure you actually need to use a mixed-effect model? Given that DESeq2/edgeR/etc. use shrinkage, a mixed-effect model is unlikely to benefit you.
  2. Have a look at limma's duplicateCorrelation() function.
ADD REPLY
0
Entering edit mode

Thanks Devon. This comment has come up in many of the posts that I have read.

For me, when an experiment is designed with blocking and replication within the individual, the individual and experimental blocking must be analyzed as random effects. This is a pretty standard quantitative genetics design. Furthermore, we have a ton of replication within the experimental factors we are testing among, so I am not convinced that shrinkage is a particularly good method to estimate within group variances.

Anyways, even if I did use fixed effects, I am still unsure about the best way to analyze these highly 0-inflated and binary gene expression phenotypes. Thanks again.

ADD REPLY
0
Entering edit mode

Certainly if you were to compare a straight GLM and a GLMM on your dataset then the GLMM would work better...but of course a GLMM is just doing shrinkage in a different way than DESeq2 et al., which aren't straight GLMs.

Regarding the zeroes, it depends a bit on exactly what you mean by zero inflated and where the problem is. If the case is that you have absolutely 0 expression in all but one sample, then that can be problematic. I suppose how to deal with that depends on whether you find those cases biologically interesting. For most people they wouldn't be, but I can think of counter examples (e.g., single-cell sequencing).

ADD REPLY

Login before adding your answer.

Traffic: 1980 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6