Normalization for comparing expression statistics across sample and gene groups
1
0
Entering edit mode
4.1 years ago

Hi,

What would be the current best practice for normalizing gene expression counts, if I want to compare different characteristics of genes and particluarly gene groups (min, max, mean, sd of expression) between two conditions? I'm interested in questions like: "Is the variance of expression means in condition A is larger than in condition B for a specific gene group?". So genes in group X have more variable mean expression in A than in B, while this is not true for gene group Y.

I guess I have to normalize for library size, gene length, and also correct for the mean-variance dependence of expression.

Maybe vst + rpkm or tpm transformation? Any other suggestions? Not sure if I can do an rpkm or tpm transformation after vst.

This is an example dataset:

data <- as.data.frame(matrix(rpois(100, lambda = 10), ncol = 5))
colnames(data) <- c("A1", "A2", "A3", "B1", "B2")
genes <- paste0("gene", 1:20)
gene_group <- c(rep("X", 15), rep("Y", 5))
data <- cbind(data, genes, gene_group)
RNA-Seq normalization gene sets • 1.0k views
ADD COMMENT
0
Entering edit mode
3.7 years ago

I would as you suggest use vst and then afterwards normalize for gene-length as well ( vst / gene_length * 1e3 ). You could check with the cqn package if GC normalisation seems to be needed.

ADD COMMENT

Login before adding your answer.

Traffic: 2761 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6