Question: How to identify differential expressed genes if the normalization assumptions are violated
afli170 wrote:

Hi, I do the DESeq2 analysis of RNA-seq samples from two different tissues, each with 3 replicates, and both PCA or sample correlation heatmap results is normal and as expected. The total diffrential gene number has reached ~60% (padj<0.1), this did not obey the DESeq2 assumption that most of genes are not diffrentially expressed. So using DESeq2 may not be appropriate.

I search for two related posts: & Differential expression for two very different samples. But I still cannot find the good solution for the case.

I know there must be a large proportion of genes differentially expressed between the tissues, the problem is how to identify them in a correct way? I think many of you may encounter this phenomenon when analysis RNA-seq samples between different tissues. How do you figure out this?

Thank you very much!


With padj<0.1 possibilities that, you are considering a high number of false positive genes. You may try with stringent cut-offs like padj< 0.05 and also Log2FC > 2 (mostly consider in this field, but that vary from research questions). Likewise, you can get some significant amount of genes.

Devon Ryan89k wrote:

You run into problems when there are global changes in expression that aren't (roughly) symmetric. It's OK if the majority of genes are DE, as long as they're not more or less changing in one direction. This is the case for all standard normalization methods and not particular to DESeq2. Have a look at the MA plot and next time consider adding some spike-ins to double check that the results are reasonable.

