What would be the current best practice for normalizing gene expression counts, if I want to compare different characteristics of genes and particluarly gene groups (min, max, mean, sd of expression) between two conditions? I'm interested in questions like: "Is the variance of expression means in condition A is larger than in condition B for a specific gene group?". So genes in group X have more variable mean expression in A than in B, while this is not true for gene group Y.
I guess I have to normalize for library size, gene length, and also correct for the mean-variance dependence of expression.
tpm transformation? Any other suggestions? Not sure if I can do an
tpm transformation after
This is an example dataset:
data <- as.data.frame(matrix(rpois(100, lambda = 10), ncol = 5)) colnames(data) <- c("A1", "A2", "A3", "B1", "B2") genes <- paste0("gene", 1:20) gene_group <- c(rep("X", 15), rep("Y", 5)) data <- cbind(data, genes, gene_group)