Question

Seurat scaling values not between 0 and 1

0

Entering edit mode

4.1 years ago

bs58 ▴ 10

I've followed the Seurat vignette tutorial for pre-processing my scRNAseq data.

When I look at the scaled data (using the ScaleData() function), I get values between -14.18 and 10, with an average of -0.005.

Seurat vignette says:

Shifts the expression of each gene, so that the mean expression across cells is 0 Scales the expression of each gene, so that the variance across cells is 1

This what I did:

# Initialize the seurat object with the non normalized data
h5 <- CreateSeuratObject(counts = h5.data)
dataset = h5

#QC
dataset[["percent.mt"]] <- PercentageFeatureSet(dataset, pattern = "^MT-") # mitochondrial percentage

#Filtering
dataset <- subset(dataset, subset = nFeature_RNA > 250 & nFeature_RNA < 10000 & percent.mt < 15)

#Normalization
dataset <- NormalizeData(dataset, normalization.method = "LogNormalize", scale.factor = 10000)

#Most variable gene identification
dataset <- FindVariableFeatures(dataset, selection.method = "vst", nfeatures = 2000, verbose = FALSE, dispersion.cutoff = c(-Inf, 0.5), mean.cutoff = c(0.0125, 3))

#Scaling
all.genes <- rownames(dataset)
dataset <- ScaleData(dataset)

What did I do wrong?

scRNA-seq scaling seurat • 4.1k views

ADD COMMENT • link updated 4.1 years ago by ATpoint 88k • written 4.1 years ago by bs58 ▴ 10

score 3 · Accepted Answer · 2021-06-30

3

Entering edit mode

4.1 years ago

ATpoint 88k

Scaling means that each value gets transformed to represent the deviation from the mean, in this case the deviation of each cell from the mean of all cells and this for every gene.
So you get values being negative and positive with a mean of zero (or approxemately zero, this I am not fully sure of).
What you see is normal and expected, see also: https://en.wikipedia.org/wiki/Standard_score

You can see how this works here by typing down this R code:

#/ make some dummy data with float values:
dummy_data <- sapply(1:5, function(x) rnorm(3,100,10))

#/ now scale them and then compare by eye what happened:
scaled <- t(scale(t(dummy_data)))

This produces a dummy count matrix with three "genes" and five "cells" and then scales it.

ADD COMMENT • link 4.1 years ago by ATpoint 88k

1

Entering edit mode

Thank you for your answer!

What confused me is the following sentence from the Suerat package: "Scales the expression of each gene, so that the variance across cells is 1", which made me believe that if the mean is 0 and the variance 1 all the values should be between 0 and 1

ADD REPLY • link 4.1 years ago by bs58 ▴ 10

0

Entering edit mode

Think about it, if the mean is zero and variance is != 0 then there must be negative values simply by how mean and variance work mathematically.

ADD REPLY • link 4.1 years ago by ATpoint 88k

0

Entering edit mode

You're right! But then values should be between -1 and 1 right? and not between -14 and 10

ADD REPLY • link 4.1 years ago by bs58 ▴ 10

2

Entering edit mode

No, you can simulate this: hist(rnorm(1000, 0, 1)), some outliers will probably produce these extreme values, but if the sample size is large enough these have a modest influence on the total variance.

ADD REPLY • link 4.1 years ago by ATpoint 88k