Question

Normalization for comparing expression statistics across sample and gene groups

0

Entering edit mode

4.1 years ago

endre.sebestyen ▴ 10

Hi,

What would be the current best practice for normalizing gene expression counts, if I want to compare different characteristics of genes and particluarly gene groups (min, max, mean, sd of expression) between two conditions? I'm interested in questions like: "Is the variance of expression means in condition A is larger than in condition B for a specific gene group?". So genes in group X have more variable mean expression in A than in B, while this is not true for gene group Y.

I guess I have to normalize for library size, gene length, and also correct for the mean-variance dependence of expression.

Maybe vst + rpkm or tpm transformation? Any other suggestions? Not sure if I can do an rpkm or tpm transformation after vst.

This is an example dataset:

data <- as.data.frame(matrix(rpois(100, lambda = 10), ncol = 5))
colnames(data) <- c("A1", "A2", "A3", "B1", "B2")
genes <- paste0("gene", 1:20)
gene_group <- c(rep("X", 15), rep("Y", 5))
data <- cbind(data, genes, gene_group)

RNA-Seq normalization gene sets • 1.0k views

ADD COMMENT • link updated 3.7 years ago by Kristoffer Vitting-Seerup ★ 4.0k • written 4.1 years ago by endre.sebestyen ▴ 10

score 0 · Answer 1 · 2020-08-07

0

Entering edit mode

3.7 years ago

Kristoffer Vitting-Seerup ★ 4.0k

I would as you suggest use vst and then afterwards normalize for gene-length as well ( vst / gene_length * 1e3 ). You could check with the cqn package if GC normalisation seems to be needed.

ADD COMMENT • link 3.7 years ago by Kristoffer Vitting-Seerup ★ 4.0k