Hey everyone,

In the last couple of years I’ve analyzed several different scRNA-seq of epithelial, and in most of these I found that one of the two groups of each experiment (different conditions each time) had enrichment of mitochondrial and ribosomal pathways (note: the mitochondrial pathways I’m talking about are based on nuclear-encoded genes).

I wonder: generally speaking, are the chances of enrichment of mitochondrial/ribosomal pathways being an artifact higher than those of other types of pathways? If so, do you have any advice on how to deal with that?

Cheers, Omer

(I've also asked this on the "Seurat" forum, but I suspect my chances of getting some useful answers are probably higher here)

Generally these genes are highly expressed, so there would be some sort of a power bias. How do you do the DE analysis? Any cutoffs on minimal fold change?

Thanks a lot for the response!

I'm doing pathway enrichment analysis (using clusterProfiler) on almost all genes in each cluster, with the only cutoff applied being that the gene has to be detectable in all mice. The enrichment analysis is based on p-values (more accurately, on the -log10 of the p-value, with a p-value of a gene with a positive logFC getting a "+" sign and a p-value of a gene with a negative logFC getting a "-" sign).

If you do DE analysis on single-cell level then I find it critical to enforce some cutoff, as many cells means a lot of power so you get even tiny fold changes as significant. In this case it could well be that you're seeing artifacts as these genes are highly-expressed, so lots of power on the counts and many cells, so your fold changes could indeed be close to zero, yet significant. See also scRNAseq Differential expression analysis

Sorry, I now realize my explanation was unclear. I'm

notrunning DE analysis on a single-cell basis. Instead, for every cluster, I find the average for each gene in each mouse (4 mice per group, two groups), then I find the p-value of the comparison between the 4 mice in one group and the 4 mice in the other group. I then run a sort-of GSEA analysis (using "clusterProfiler") where the ranking is based on the p-value. Here's an example, to try and better explain myself:Let's take the gene "Rrs1" (row 7) as an example: in the particular cluster I'm showing here, its average expression in all of the cells (including those where it's undetectable) of mouse "WT1" is 0.16666667, in "WT2" the average is 0.05434783, and so forth. I then compare the 4 average values (1 per mouse, 4 mice per group) obtained for this gene (in this cluster) in the WT group to the 4 average values obtained in the KO group in order to come up with a p-value, which I then turn into what I call here "logP", which is actually -log10 of the p-value (so that the smaller the p-value is, the higher the "logP" will be). The "metric" column is the value found in the "logP" column, but with a sign: if the gene is, on average (of the per-mouse averages), higher in the WT group, the sign is "+", while if its expression on average is higher in the KO group, the sign is "-". I then run a GSEA-like analysis on the "Gene"-"metric" table using "clusterProfiler". Note: this kind of GSEA-like analysis is done on all genes (so, those where there is a clear difference between the groups, but also those where there isn't one).

im unclear on why you would ask this question without providing more context about the cell states that are being explored by the experiments.

energy metabolism and ribosomal synthesis pathways are both fairly commonly upregulated by processes favoring cell growth / cell division / anabolism.

So its not a short list of states in which these are DE. however, for the results to be interesting, you'd likely want to also be commenting on how these differential expression results relate to other findings in the experiment - i.e., the co-expression of these things with other pathways of interest...in particular anything that might be unexpected in such contexts.