Seurat scaling values not between 0 and 1
1
0
Entering edit mode
2.8 years ago
bs58 ▴ 10

I've followed the Seurat vignette tutorial for pre-processing my scRNAseq data.

When I look at the scaled data (using the ScaleData() function), I get values between -14.18 and 10, with an average of -0.005.

Seurat vignette says:

Shifts the expression of each gene, so that the mean expression across cells is 0 Scales the expression of each gene, so that the variance across cells is 1

This what I did:

# Initialize the seurat object with the non normalized data
h5 <- CreateSeuratObject(counts = h5.data)
dataset = h5

#QC
dataset[["percent.mt"]] <- PercentageFeatureSet(dataset, pattern = "^MT-") # mitochondrial percentage

#Filtering
dataset <- subset(dataset, subset = nFeature_RNA > 250 & nFeature_RNA < 10000 & percent.mt < 15)

#Normalization
dataset <- NormalizeData(dataset, normalization.method = "LogNormalize", scale.factor = 10000)

#Most variable gene identification
dataset <- FindVariableFeatures(dataset, selection.method = "vst", nfeatures = 2000, verbose = FALSE, dispersion.cutoff = c(-Inf, 0.5), mean.cutoff = c(0.0125, 3))

#Scaling
all.genes <- rownames(dataset)
dataset <- ScaleData(dataset)

What did I do wrong?

scRNA-seq scaling seurat • 2.9k views
ADD COMMENT
3
Entering edit mode
2.8 years ago
ATpoint 81k

Scaling means that each value gets transformed to represent the deviation from the mean, in this case the deviation of each cell from the mean of all cells and this for every gene.
So you get values being negative and positive with a mean of zero (or approxemately zero, this I am not fully sure of).
What you see is normal and expected, see also: https://en.wikipedia.org/wiki/Standard_score

You can see how this works here by typing down this R code:

#/ make some dummy data with float values:
dummy_data <- sapply(1:5, function(x) rnorm(3,100,10))

#/ now scale them and then compare by eye what happened:
scaled <- t(scale(t(dummy_data)))

This produces a dummy count matrix with three "genes" and five "cells" and then scales it.

ADD COMMENT
1
Entering edit mode

Thank you for your answer!

What confused me is the following sentence from the Suerat package: "Scales the expression of each gene, so that the variance across cells is 1", which made me believe that if the mean is 0 and the variance 1 all the values should be between 0 and 1

ADD REPLY
0
Entering edit mode

Think about it, if the mean is zero and variance is != 0 then there must be negative values simply by how mean and variance work mathematically.

ADD REPLY
0
Entering edit mode

You're right! But then values should be between -1 and 1 right? and not between -14 and 10

ADD REPLY
2
Entering edit mode

No, you can simulate this: hist(rnorm(1000, 0, 1)), some outliers will probably produce these extreme values, but if the sample size is large enough these have a modest influence on the total variance.

ADD REPLY

Login before adding your answer.

Traffic: 1807 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6