Question

is normalize RNA seq raw counts used correctly?

0

Entering edit mode

3.0 years ago

kng ▴ 40

I used calcNormFactors from edgeR to normalize my RNA seq data matrix with raw counts. Here is the code I am using. d_des is my design matrix and cm is my contrast matrix

counts_matrix <- as.matrix(raw_counts_dt, rownames = TRUE)
de_List <- DGEList(counts_matrix)
de_List %<>% calcNormFactors
res_Voom <- voom(de_List, d_des, plot = TRUE)
lm_Fit <- lmFit(res_Voom, d_des)
eb_Fit <-  eBayes(contrasts.fit(lm_Fit, cm), trend = F, robust = T)

I have the following questions:

If I understand the document correctly, TMM is the default normalization method? Is this the correct method to be used for normalizing raw counts to be used for DE analysis? If not which one should be used?

what is it normalized for? It is normalizing for all genes in my list across all samples ie. replicates for each groups?
Is the voom using the normalized counts from my de_List or is it still using the raw counts? If it is using raw counts, how could I use the normalized counts?
I am using res_Voom$E matrix to create the PCA plot to compare my conditions. Is this using the normalized counts or the raw counts? how do I use normalized counts if it is not?
I am using topTable from eb_Fit to list the top differentially expressed genes. Is this the correct table to use?

Thank you in advance for your answers!

limma-voom edgeR normalization raw-counts RNAseq • 1.8k views

ADD COMMENT • link updated 3.0 years ago by Gordon Smyth ★ 8.5k • written 3.0 years ago by kng ▴ 40

score 1 · Accepted Answer · 2022-11-04

See

https://www.bioconductor.org/packages/release/workflows/vignettes/RNAseq123/inst/doc/limmaWorkflow.html

for an example limma-voom analysis with detailed explanations of all the steps. You might also find this post helpful:

How do you generate TMM normalized counts using EdgeR?

The short answer though is that everything happens correctly in the standard limma-voom pipeline and you can just use the defaults. TMM is set as the default for calcNormFactors because that's what we recommend for bulk RNA-seq analyses. voom automatically detects the TMM normalization factor. Yes, topTable is the standard way of extracting DE genes.

It would be better to not "hack" the voom object to make PCA plots. Better to use the approach recommended in the above workflow.