I have seem some platforms such as GEPIA offering ANOVA for differential gene expression analysis. However, as far as I'm concerned, ANOVA compares the averages and assumes equal distribution and variance among samples, which, as far I have been lead to assume, is uncommon for any kind of RNA-seq derived data, especially considering the thousands of possibly expressed genes in the human genome. Is ANOVA really appropriate for differential expression?

Nope. Also the distribution of RNA-seq data is not normal (as an ANOVA also assumes). You should use specifically designed tools such as edgeR, DESeq2 or limma.

Limma is based on linear model/ANOVA under the hood actually. While raw RNA-seq data is never normal, Limma uses the log-transformed CPM – which are normal enough for ANOVA. So yes, it is possible to analyze RNA-seq data with ANOVA but I agree that it is rather sub-optimal compared to more modern methods such as DESeq2 and edgeR (based on negative binomial modeling on the raw counts).

The issue is not so much "normality" (limma-voom and sleuth don't do negative binomial modeling and they work super well). The issue lies with variance estimation which is why limma does not use t-tests/ANOVA in the traditional sense; it uses those tests but regularizes the variance estimates which is necessary in almost all cases.

Most differential gene expression packages support ANOVA-like comparisons, so just stick with those.

Yes, I agree that Limma is not 'classical' ANOVA but rather an extension of ANOVA. Still, I wanted to add some nuance to the clear-cut answer above stating that ANOVA can not be used for RNA-seq analysis because the counts are not normal.

Also, my understanding is that normality would be an issue without the log transformation of the count data for linear model/ANOVA -based method such as Limma, but not for edgeR or DESeq2 since they assume different properties from the data.

According to A Beginner’s Guide to Analysis of RNA Sequencing Data (https://www.atsjournals.org/doi/10.1165/rcmb.2017-0430TR) ANOVA is an appropriate analysis for RNA-seq data. However, the review doesn't specify a tool/package to do this analysis. Searching for how to do ANOVA on RNA-seq data brought me to this page.

Limma is based on linear model/ANOVA under the hood actually. While raw RNA-seq data is never normal, Limma uses the log-transformed CPM – which are normal enough for ANOVA. So yes, it is possible to analyze RNA-seq data with ANOVA but I agree that it is rather sub-optimal compared to more modern methods such as DESeq2 and edgeR (based on negative binomial modeling on the raw counts).

The issue is not so much "normality" (limma-voom and sleuth don't do negative binomial modeling and they work super well). The issue lies with variance estimation which is why limma does not use t-tests/ANOVA in the traditional sense; it uses those tests but regularizes the variance estimates which is necessary in almost all cases.

Most differential gene expression packages support ANOVA-like comparisons, so just stick with those.

Yes, I agree that Limma is not 'classical' ANOVA but rather an extension of ANOVA. Still, I wanted to add some nuance to the clear-cut answer above stating that ANOVA can not be used for RNA-seq analysis because the counts are not normal.

Also, my understanding is that normality would be an issue without the log transformation of the count data for linear model/ANOVA -based method such as Limma, but not for edgeR or DESeq2 since they assume different properties from the data.