Question

Should I scale all genes in single cell Seurat?

0

Entering edit mode

23 months ago

synat.keam ▴ 120

Apologise for many posts this weeks. I am wondering in seurat, should I scale all genes for downstream analysis or just some features is okay? I am a bit unclear when it comes to scaling.... I have attached the code here.

Also, how do I know which genes are noise/confounding genes, could you refer me to R script somewhere to look for noise genes and how to filter them out in R? Really appreciate your help so far!

data.filt<- NormalizeData(data.filt)


#cell cycle 

data.filt<- CellCycleScoring(data.filt, g2m.features = cc.genes$s.genes, s.features = cc.genes$g2m.genes, set.ident = TRUE)

VlnPlot(data.filt, features = c("S.Score", "G2M.Score"), group.by = "orig.ident",
    ncol = 4, pt.size = 0.1)

data.filt<- FindVariableFeatures(data.filt, selection.method = "vst", verbose = FALSE, nfeatures = 2000)

# Scale data 
all.genes<- rownames(data.filt)


# Option 1

data.filt<- ScaleData(data.filt, 
                      vars.to.regress = c(
                        "nCount_RNA","nFeature_RNA", "percent_mito", 
                                          "percent_ribo", "S.Score", "G2M.Score"
                        ), 
                      verbose= FALSE)

# Option 2
data.filt<- ScaleData(data.filt, 
                      vars.to.regress = c(
                        "nCount_RNA","nFeature_RNA", "percent_mito", 
                                          "percent_ribo", "S.Score", "G2M.Score"
                        ), 
                       features = rownames(all.genes),
                      verbose= FALSE)

# PCA

data.filt<- RunPCA(data.filt, verbose = FALSE)

singlecell • 3.7k views

ADD COMMENT • link 23 months ago by synat.keam ▴ 120

score 4 · Accepted Answer · 2023-12-01

Hi,

Scaling all features might be useful to plot genes that are not among the 2k HVG in a heatmap. Otherwise than that, I never encountered a specific analysis where I would need all the genes scaled, but, of course, there might be such an analysis. Thus, I would say it depends on which downstream analyses are you interested in and which type of input data they require.

Regarding your second question, there might be more sophisticated analyses one could do, but usually plotting the expression of such genes/features in a PCA can roughly give you an idea if a gene or cell feature, e.g., percentage of mitochondrial genes, represents noise or a confounding variable. Correlating such suspected noise/confounding genes/features with PCs (Principal Components) might help to quantify such effects. Of course, the challenge relies on distinguish between confounding and biological meaningful genes/features. A gene might be correlated with PC1 because drives differentiation and, thus, it is biological meaningful. Though UMI counts might be correlated with PC2 and might mean it is noise. The decision always needs to be supported with your expectations considering the experimental design, biological conditions and cell types as well questions that you're trying to answer.

I hope this helps.

Best regards,

António