HI,
I am bit confused the use of RNA vs SCT assays for DGE analysis, and wondering if anybody who uses Seurat to shed a light. I've been preforming a Seurat3 integration method with SCTranform by simply following their vignette. According to some discussion and the vignette, a Seurat team indicated that the RNA assay (rather than integrated or Set assays) should be used for DotPlot and FindMarkers functions, for comparing and exploring gene expression differences across cell types. But the RNA assay has raw count data while the SCT assay has scaled and normalized data. It seems to me that numbers in the SCT assay are more appropriate for comparing DGE among cell types. Am I missing something ?
Thanks.
Apologies for resurrecting this old post.
When googling about "Seurat FindAllMarkers SCTransform" I have seen suggested on many forums that the results of running Seurat's FindAllMarkers on the SCT or the RNA slots should give almost identical results.
However, even in 2025, with Seurat v5.2.1, this original comment still stands true, at least in some cases.
I am analyzing a tissue containing a mixture of Smooth Muscle Cells, Fibroblasts, Endothelial, and Immune Cells (widely dominated by SMCs). SCT transformation creates nice and consistent clusters for celltypes and subtypes. Running FindAllMarkers at the celltype level gives very consistent (almost 100% match) results when used in either SCT or RNA slot.
However, when subsetting the data only for a specific cell type, then differences arise. In our case, subsetting the data to only immune cells, and then running FindAllMarkers at the subtype level, created huge differences between using SCT or RNA. After some exploration of the data, one of our subtypes (T-cells cluster#2), in the RNA slot, had significantly lower nFeatures_RNA than other immune subtypes. This caused SCTtransformation to "fill up this data" with a lot of "imputed/inferred/fake gene counts". And I suspect these gene counts are taken from the most abundant cell types, SMCs.
So, when running FindAllMarkers in the subsetted immune data, when run on the SCT slot, "Acta2" (a SMC marker) appears as the #4 top most significant markers in T-cells cluster #2. With
p_val_adj = 0
,pct.1 = 0.771
andpct.2 = 0.266
. This (and many more SMC-associated genes) are nowhere to be found when running FindAllMarkers on the RNA slot.In the case of Acta2, it has counts>0 in 397/515 (77.08%) cells in the SCT slot. However, it is actually present in 3/515 (0.58%) cells in the RNA slot. In this cell type, this gene's whole signal in the SCT object is made up of artificial counts.
In fact, out of the top20 Markers using the SCT slot, only 3 of them are considered markers at all when running FindAllMarkers on the RNA slot (looking at all significant markers, not only the top20 in RNA).
Let this be a cautionary tale for future people searching about this issue.