Seurat scaling values not between 0 and 1
1
0
Entering edit mode
3 months ago
bs58 • 0

I've followed the Seurat vignette tutorial for pre-processing my scRNAseq data.

When I look at the scaled data (using the ScaleData() function), I get values between -14.18 and 10, with an average of -0.005.

Seurat vignette says:

Shifts the expression of each gene, so that the mean expression across cells is 0 Scales the expression of each gene, so that the variance across cells is 1

This what I did:

# Initialize the seurat object with the non normalized data
h5 <- CreateSeuratObject(counts = h5.data)
dataset = h5

#QC
dataset[["percent.mt"]] <- PercentageFeatureSet(dataset, pattern = "^MT-") # mitochondrial percentage

#Filtering
dataset <- subset(dataset, subset = nFeature_RNA > 250 & nFeature_RNA < 10000 & percent.mt < 15)

#Normalization
dataset <- NormalizeData(dataset, normalization.method = "LogNormalize", scale.factor = 10000)

#Most variable gene identification
dataset <- FindVariableFeatures(dataset, selection.method = "vst", nfeatures = 2000, verbose = FALSE, dispersion.cutoff = c(-Inf, 0.5), mean.cutoff = c(0.0125, 3))

#Scaling
all.genes <- rownames(dataset)
dataset <- ScaleData(dataset)


What did I do wrong?

scRNA-seq scaling seurat • 460 views
3
Entering edit mode
3 months ago
ATpoint 55k

Scaling means that each value gets transformed to represent the deviation from the mean, in this case the deviation of each cell from the mean of all cells and this for every gene.
So you get values being negative and positive with a mean of zero (or approxemately zero, this I am not fully sure of).

You can see how this works here by typing down this R code:

#/ make some dummy data with float values:
dummy_data <- sapply(1:5, function(x) rnorm(3,100,10))

#/ now scale them and then compare by eye what happened:
scaled <- t(scale(t(dummy_data)))


This produces a dummy count matrix with three "genes" and five "cells" and then scales it.

0
Entering edit mode

What confused me is the following sentence from the Suerat package: "Scales the expression of each gene, so that the variance across cells is 1", which made me believe that if the mean is 0 and the variance 1 all the values should be between 0 and 1

0
Entering edit mode

Think about it, if the mean is zero and variance is != 0 then there must be negative values simply by how mean and variance work mathematically.

0
Entering edit mode

You're right! But then values should be between -1 and 1 right? and not between -14 and 10

2
Entering edit mode

No, you can simulate this: hist(rnorm(1000, 0, 1)), some outliers will probably produce these extreme values, but if the sample size is large enough these have a modest influence on the total variance.