We take calculations of logFC in our favourite packages as granted, but it actually isn't as straightforward. Let's take the Seurat's way of calculating logFC:
> Seurat:::FoldChange.default
function (object, cells.1, cells.2, mean.fxn, fc.name, features = NULL,
...)
{
features <- features %||% rownames(x = object)
thresh.min <- 0
pct.1 <- round(x = rowSums(x = object[features, cells.1,
drop = FALSE] > thresh.min)/length(x = cells.1), digits = 3)
pct.2 <- round(x = rowSums(x = object[features, cells.2,
drop = FALSE] > thresh.min)/length(x = cells.2), digits = 3)
data.1 <- mean.fxn(object[features, cells.1, drop = FALSE])
data.2 <- mean.fxn(object[features, cells.2, drop = FALSE])
fc <- (data.1 - data.2)
fc.results <- as.data.frame(x = cbind(fc, pct.1, pct.2))
colnames(fc.results) <- c(fc.name, "pct.1", "pct.2")
return(fc.results)
}
<bytecode: 0x324be7970>
<environment: namespace:Seurat>
where mean.fxn
is calculated as:
default.mean.fxn <- function(x) {
return(log(x = (rowSums(x = expm1(x = x)) + pseudocount.use)/NCOL(x), base = base))
}
So in Seurat we calculate log of ratio of arithmetic means, unlike in presto::wilcoxauc
, where we use the geometric mean.
In presto we calculate the logFC as follows:
lfc <- Reduce(cbind, lapply(seq_len(length(levels(y))), function(g) {
group_means[, g] - ((cs - group_sums[g, ]) / (length(y) - gs[g]))
}))
So that is a log of ratio of geometric means. Which is, according to Lötsch et al. (2024), better suited for scRNAseq data (arithmetic mean leads to inaccuracies when subgroup variances or underlying distributions differ, while the geometric mean or median yields more robust estimates in heterogeneous data).
So should we just conclude that the presto
's implementation is better and Seurat should switch to it too?
I personally think that the DE machinery in Seurat is entirely terrible and oversimplified. What they do is simply (by default) to test a given group of cells (e.g. a cluster) against the union average of all other cells, regardless which cluster the other cells belong to. There is also no way to test against a fold change so you end up with a lot of genes with small effect size but statistical significance. That might in some cases be ok for markers alone, where you essentially get the top-top-top genes per cluster, but for a proper pairwise analysis it's imo largely useless.
The thing here is also that the Wilcox test purely focuses on the rank differences, it does not test the fold changes. Still, most people will filter on the fold change which is not robust. Fold changes in scRNA-seq often have large variances due to the spread of expression values and presence of zeros. If you use a method like limma and let it return confidence intervals you will impressively see that. Or just plot single-cell expression of a couple of genes to visualize that easily. Anyway, I think it's basic data science knowledge that in especially in the presence of large variances you need to go geometric, not arithmetic.
I personally always use limma for any DE testing. Pseudobulk is almost always better than single-cell level, but the experimental design must enable it -- most studies do not care about biological replication unfurtunately. When using limma, you can relatively easily do the one-versus-average testing like
A - (B+C+D)/3
to get the genes that are overexpressed in A versus the average of the rest, but group-aware, unlike what Seurat does.In a nutshell, I would prefer geometric mean-based logFCs if you have to choose between your two options, but actually rather use limma to get them properly moderated, and with confidence intervals. Limma also allows testing against fold changes, with its
treat
function, so you can protect against genes with tiny effect size but statistical significance. Unfortunately both Seurat and Presto do not (to my knowledge) support this. Post-hoc filtering of logFCs is not good, because oftentimes there is evidence that the logFC between groups is not 0 (the default test), but there is little evidence that it is e.g.> 1.1
because of all the noise. Such testing in my hands protects against a lot of spurious DEGs. Also essential: Proper prefiltering, see e.g. work from Soneson et al on single-cell DE testing, Nature Methods from a few years ago. End of my rant, thank you. Don't drag this to an answer, it's just a "I had no coffee" rant.To play the devil advocate, Seurat function names are
FindMarkers
orFindAllMarkers
which characterize most representative marker genes of a given cluster versus another cluster or against all other clusters. However, the logFC is highly biais in their calculation and should be seen more as a confidence score rather than a strict logFC, and the documentation should mention that clearly.I usually use Seurat function to find marker genes and plot their expression on UMAP but for DEG a better way to go is pseudobulk and common RNA-seq DE tools like limma/DESeq2/EdgeR...
I would exactly not see it as a confidence score as they do not provide variance or confidence intervals. The value can be gigantically large, but this does not increase confidence if driven by variance and noise.
In the last version of Seurat, presto is used by default (if found in the environment) for logFC calculation like in FindMarkers()
https://satijalab.org/seurat/articles/pbmc3k_tutorial.html
Seurat's default behavior for
Find(All)Markers
is indeed to use presto whenever available, as you suggest. However it only uses presto to calculate the p-value (and even then it adjusts it using Bonferroni correction, unlike the Benjamini-Hochberg correction in presto itself): https://github.com/satijalab/seurat/blob/e44cb2ce21aff3cbd94c2ddf24f9a1db6745e121/R/differential_expression.R#L2502C3-L2502C64The
logFC
, in the meantime, is calculated independently by Seurat itself: https://github.com/satijalab/seurat/blob/e44cb2ce21aff3cbd94c2ddf24f9a1db6745e121/R/differential_expression.R#L787C3-L797C4So they want to stick with their own way of calculating logFC I guess.
I have already opened an "issue" (more a discussion) on the use of the pseudocounts that was disturbing the logFC results. They advocate that it was to reduce the number of DE genes with low expression value.