Normalization for comparing expression statistics across sample and gene groups
1
0
Entering edit mode
12 months ago

Hi,

What would be the current best practice for normalizing gene expression counts, if I want to compare different characteristics of genes and particluarly gene groups (min, max, mean, sd of expression) between two conditions? I'm interested in questions like: "Is the variance of expression means in condition A is larger than in condition B for a specific gene group?". So genes in group X have more variable mean expression in A than in B, while this is not true for gene group Y.

I guess I have to normalize for library size, gene length, and also correct for the mean-variance dependence of expression.

Maybe vst + rpkm or tpm transformation? Any other suggestions? Not sure if I can do an rpkm or tpm transformation after vst.

This is an example dataset:

data <- as.data.frame(matrix(rpois(100, lambda = 10), ncol = 5))
colnames(data) <- c("A1", "A2", "A3", "B1", "B2")
genes <- paste0("gene", 1:20)
gene_group <- c(rep("X", 15), rep("Y", 5))
data <- cbind(data, genes, gene_group)

RNA-Seq normalization gene sets • 448 views
0
Entering edit mode
8 months ago

I would as you suggest use vst and then afterwards normalize for gene-length as well ( vst / gene_length * 1e3 ). You could check with the cqn package if GC normalisation seems to be needed.