ANOVA for RNA-seq data?
2
1
Entering edit mode
17 months ago

Hello

I have seem some platforms such as GEPIA offering ANOVA for differential gene expression analysis. However, as far as I'm concerned, ANOVA compares the averages and assumes equal distribution and variance among samples, which, as far I have been lead to assume, is uncommon for any kind of RNA-seq derived data, especially considering the thousands of possibly expressed genes in the human genome. Is ANOVA really appropriate for differential expression?

anova RNA-Seq • 2.1k views
ADD COMMENT
2
Entering edit mode
17 months ago

Nope. Also the distribution of RNA-seq data is not normal (as an ANOVA also assumes). You should use specifically designed tools such as edgeR, DESeq2 or limma.

ADD COMMENT
0
Entering edit mode

Limma is based on linear model/ANOVA under the hood actually. While raw RNA-seq data is never normal, Limma uses the log-transformed CPM – which are normal enough for ANOVA. So yes, it is possible to analyze RNA-seq data with ANOVA but I agree that it is rather sub-optimal compared to more modern methods such as DESeq2 and edgeR (based on negative binomial modeling on the raw counts).

ADD REPLY
0
Entering edit mode

The issue is not so much "normality" (limma-voom and sleuth don't do negative binomial modeling and they work super well). The issue lies with variance estimation which is why limma does not use t-tests/ANOVA in the traditional sense; it uses those tests but regularizes the variance estimates which is necessary in almost all cases.

Most differential gene expression packages support ANOVA-like comparisons, so just stick with those.

ADD REPLY
0
Entering edit mode

Yes, I agree that Limma is not 'classical' ANOVA but rather an extension of ANOVA. Still, I wanted to add some nuance to the clear-cut answer above stating that ANOVA can not be used for RNA-seq analysis because the counts are not normal.

Also, my understanding is that normality would be an issue without the log transformation of the count data for linear model/ANOVA -based method such as Limma, but not for edgeR or DESeq2 since they assume different properties from the data.

ADD REPLY
0
Entering edit mode
14 days ago

According to A Beginner’s Guide to Analysis of RNA Sequencing Data (https://www.atsjournals.org/doi/10.1165/rcmb.2017-0430TR) ANOVA is an appropriate analysis for RNA-seq data. However, the review doesn't specify a tool/package to do this analysis. Searching for how to do ANOVA on RNA-seq data brought me to this page.

ADD COMMENT
0
Entering edit mode

Just because a paper does it, doesn't mean it's a correct or statistically sound approach.

Even if you make log normal abundances from your RNA-seq data, using t-tests or ANOVAs to find DE genes is still problematic.

ADD REPLY

Login before adding your answer.

Traffic: 2679 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6