Normalization for comparing expression statistics across sample and gene groups
Entering edit mode
20 months ago


What would be the current best practice for normalizing gene expression counts, if I want to compare different characteristics of genes and particluarly gene groups (min, max, mean, sd of expression) between two conditions? I'm interested in questions like: "Is the variance of expression means in condition A is larger than in condition B for a specific gene group?". So genes in group X have more variable mean expression in A than in B, while this is not true for gene group Y.

I guess I have to normalize for library size, gene length, and also correct for the mean-variance dependence of expression.

Maybe vst + rpkm or tpm transformation? Any other suggestions? Not sure if I can do an rpkm or tpm transformation after vst.

This is an example dataset:

data <-, lambda = 10), ncol = 5))
colnames(data) <- c("A1", "A2", "A3", "B1", "B2")
genes <- paste0("gene", 1:20)
gene_group <- c(rep("X", 15), rep("Y", 5))
data <- cbind(data, genes, gene_group)
RNA-Seq normalization gene sets • 587 views
Entering edit mode
16 months ago

I would as you suggest use vst and then afterwards normalize for gene-length as well ( vst / gene_length * 1e3 ). You could check with the cqn package if GC normalisation seems to be needed.


Login before adding your answer.

Traffic: 1140 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6