I am learning "ridge.net" to study gene-gene-interactoin (GGI). ridge.net can generate a partial correlation matrix (PCM) based on gene expression levels (RNA-seq gene counts), and the PCM can be used for building GGI. However, the raw or expected gene counts need to normalized to create a PCM for better results, I'd assume.
I have to admit that I fell in love with edgeR for RNA-seq analysis. It's really powerful. So I am wondering if the raw or expected gene counts can be normalized for the purpose of creating a PCM. A few questions regarding such the normalization:
- Is cpm a good normalization for such a purpose? It seems CPM only changes the original gene count by scaling. How about the fancy Bayes shrinkage modification and dispersion estimate and such? Aren't these important as well?
should "logcpm <- cpm(y, prior.count=2, normalized.lib.sizes=TRUE, log=TRUE)" be called after
y <- DGEList(counts=d);
Or after
y <- calcNormFactors(y); # global normalization
Or even after
y <- estimateDisp(y, dsgn);
is logcpm better than cpm for the purpose of getting PCM and later on to rig out GGI? I am inclined towards logcpm.
logCPM might not be good because I got ERCC.00112 shown on a network with the strongest edges (branch factor 3~5).
Any insight?