Question

Seurat CellCycleScoring – confused about the proper order of operations when using SCTransform

1

Entering edit mode

2.3 years ago

GPM ▴ 10

I have started to use Seurat to analyze data from a scRNASeq experiment, and I would like to calculate cell cycle scores for my dataset using the CellCycleScoring function (let's leave regressing out unwanted/uninteresting sources of variation out of the discussion for now). According to the “vanilla” PBMC 3K Guided Tutorial and Seurat Cell-Cycle Scoring vignette, the count data must be normalized using the NormalizeData function before invoking CellCycleScoring (so that the order of operations when not using SCTransform is: CreateSeuratObject - NormalizeData - CellCycleScoring - FindVariableFeatures - ScaleData, or alternatively CreateSeuratObject - NormalizeData - FindVariableFeatures - ScaleData - CellCycleScoring).

On the other hand, the vignette for SCTransform states that SCTransform replaces NormalizeData, ScaleData, and FindVariableFeatures. Does this imply that CellCycleScoring can be invoked directly after SCTransform (CreateSeuratObject - SCTransform - CellCycleScoring), or do the data have to be normalized by NormalizeData and then scored before invoking SCTransform (CreateSeuratObject - Normalize Data - CellCycleScoring - SCTransform), as suggested here? I am asking because apparently the number of cells in each phase differs depending on the exact procedure used, as shown below for four different scenarios.

#1. Non-normalized data
obj_0<-Filtered_seurat_object
obj_0<-CellCycleScoring(obj_0, g2m.features = g2m_genes, s.features = s_genes)
obj_0<-FindVariableFeatures(obj_0, selection.method = "vst", nfeatures = 2000)
obj_0<-ScaleData(obj_0)
obj_0<-RunPCA(obj_0)
a<-obj_0@meta.data %>% ggplot(aes(Phase)) + geom_bar() + ggtitle("1.Non-normalized data") + theme(plot.title = element_text(size = 8))

#2. NormalizeData & SCTransform
obj_1<-NormalizeData(Filtered_seurat_object)
obj_1<-CellCycleScoring(obj_1, g2m.features = g2m_genes, s.features = s_genes)
obj_1<-SCTransform(obj_1, vst.flavor = "v2")e
obj_1<-RunPCA(obj_1)
b<-obj_1@meta.data %>% ggplot(aes(Phase)) + geom_bar() + ggtitle("2.NormalizeData & SCTransform") + theme(plot.title = element_text(size = 8))

#3. SCTransform only
obj_2<-SCTransform(Filtered_seurat_object, vst.flavor = "v2")
obj_2<-CellCycleScoring(obj_2, g2m.features = g2m_genes, s.features = s_genes)
obj_2<-RunPCA(obj_2)
c<-obj_2@meta.data %>% ggplot(aes(Phase)) + geom_bar() + ggtitle("3.SCTransform only") + theme(plot.title = element_text(size = 8))

#4. NormalizeData & ScaleData (No SCTransform)
obj_3<-NormalizeData(Filtered_seurat_object)
obj_3<-CellCycleScoring(obj_3, g2m.features = g2m_genes, s.features = s_genes)
obj_3<-FindVariableFeatures(obj_3, selection.method = "vst", nfeatures = 2000)
obj_3<-ScaleData(obj_3)
obj_3<-RunPCA(obj_3)
d<-obj_3@meta.data %>% ggplot(aes(Phase)) + geom_bar() + ggtitle("4.NormalizeData & ScaleData\n(No SCTransform)") + theme(plot.title = element_text(size = 8))

plot_grid(a,b,c,d, align = "h", ncol=4, labels = "AUTO", label_size = 8)

Here are the plots showing the number of cells in each phase for each of the four methods used. I cannot help noticing that the number of cells in the non-normalized dataset (1) resembles the pattern obtained after invoking SCTransform without NormalizeData (3), whereas the number of cells in each phase is identical for methods (2) and (4) in which NormalizeData was called prior to CellCycleScoring.

Plots

Given that the "SCTransform replaces NormalizeData, ScaleData, and FindVariableFeatures", I do not know which procedure, or order of operations, should be used to most accurately describe the state of the cells in the dataset. Why are the distributions of cells into phases similar for non-normalized and SCTransformed data? Does this mean that Non-normalized data can be used for CellCycleScoring? What am I doing wrong / missing here? Thanks to anyone for reading this post and helping!

CellCycleScoring SCTransform Seurat • 1.7k views

ADD COMMENT • link updated 11 days ago by Li • 0 • written 2.3 years ago by GPM ▴ 10

0

Entering edit mode

Hi, I have the same question. If anyone can help?

ADD REPLY • link updated 9 months ago by Ram 43k • written 9 months ago by shweta.sahni • 0

score 0 · Answer 1 · 2024-05-08

I‘m not an expert in sctransform or cellcyclescoring. But I think it's because sctransform creates a new assay named "SCT", which is different from the original "RNA" assay. The cell cycle package uses the "RNA" assay, which is not normalized. If there's a way to pass the SCT assay to it, it may work. However, I didn't find any way to do that. I guess the best way to do is to normalize the RNA assay again despite we have already done SCTransform.