Question: WGCNA analysis in normal and tumor case
gravatar for shivangi.agarwal800
21 months ago by
shivangi.agarwal80070 wrote:


I want to perform WGCNA analysis, i am having RNA seq data (gene expression) in normal and tumor cases. I want to compare between control and tumor case with respect to modules, module-trait relationship etc. How to input control and tumor data to WGCNA?. Can I upload them by keeping in same excel sheet.? But then how it would recognize them. Please suggest. Thanks in advance.m

modules wgcna • 1.5k views
ADD COMMENTlink modified 21 months ago by Kevin Blighe61k • written 21 months ago by shivangi.agarwal80070
gravatar for Kevin Blighe
21 months ago by
Kevin Blighe61k
Kevin Blighe61k wrote:

You indicate that your data is currently stored in an Excel sheet, which is not typical. From where did you obtain it? Processed RNA-seq data is usually stored in a CSV or TSV file, and may be tarred and zipped and have the extension *.tar.gz. Note that storing and re-using data from an Excel sheet is not good practice. Excel adds a lot of formatting on top of your data and can even modify / manipulate it in ways that you may not anticipate.

I suggest that you go through the entire WGCNA Tutorial and then apply the methodology to your own data.

With tumour-normal data, you may consider one of the following:

  • run WGCNA separately for each of tumour and normal and then compare results
  • run WGCNA with tumour + normal together and then see which modules (and, from this, which genes contained within those modules) correlate with tumour-normal status.

If you honestly complete the online tutorial even once, you will have a better understanding. That's why the tutorial exists.


ADD COMMENTlink written 21 months ago by Kevin Blighe61k

Ya, I meant to say that only. My data is arranged in .tsv format containing gene name in column and sample id in rows. My concern was that if we want to study the difference between two groups (control and tumor) then should we input data containing both groups or we input them separately. But as per your second suggestion (run WGCNA with tumour + normal together and then see which modules (and, from this, which genes contained within those modules) correlate with tumour-normal status), if we run them together, how to compare results between two groups. Thanks in advance.

ADD REPLYlink modified 21 months ago • written 21 months ago by shivangi.agarwal80070

When you run WGCNA with tumour + normal together, the procedure after this is:

  1. for each module's eigenvalues, correlate (cor and cor.test) or regress (lm or glm) these to TumourNormal status (encoded as 1 and 0)
  2. identify modules that are statistically significant from point number 1
  3. perform literature search or pathway analysis on genes in the modules identified in point number 2
ADD REPLYlink written 21 months ago by Kevin Blighe61k

Ok thanks, I am looking into some papers. But can you elaborate more point no.1. "for each module's eigenvalues, correlate (cor and cor.test) or regress (lm or glm) these to TumourNormal status (encoded as 1 and 0)"

ADD REPLYlink written 21 months ago by shivangi.agarwal80070

Hey, I meant to generate a plot like this, where the modules are correlated to your traits / phenotypes:

While correlating to 1 and 0 is not ideal, it can still be performed. There is a function in WGCNA to generate that plot but I canot find it right now. I have a similar function here, if you need it:



A regression analysis of the modules to TumourStatus is also possible. For example:

glm(TumourNormalStatus ~ blueModule, data=data, family=binomial(link='logit'))
glm(TumourNormalStatus ~ pinkModule, data=data, family=binomial(link='logit'))
et cetera
ADD REPLYlink modified 21 months ago • written 21 months ago by Kevin Blighe61k

Ya, this module-trait relationship plot, I have got. But to perform regression analysis of module to tumorstatus, I am giving command like, but getting error

glm(TumourNormalStatus ~ darkgreen, family=binomial(link='logit'))

Error in eval(predvars, data, env) : object 'TumourNormalStatus' not found

I dont know how to define tumornormalstatus. Can you help me to define this?. Lots of thanks. I have used codes as:

    mydata = read.csv("up_down_expression_all_samples3_only_normal.csv", sep = "\t");
datExpr0 =[, -c(1)]));
names(datExpr0) = mydata$GENE;
rownames(datExpr0) = names(mydata)[-c(1)];
gsg = goodSamplesGenes(datExpr0, verbose = 3);
sampleTree = hclust(dist(datExpr0), method = "average");
clust = cutreeStatic(sampleTree, minSize = 10)
 datExpr = datExpr0[keepSamples, ]
nGenes = ncol(datExpr)
nSamples = nrow(datExpr)
traitData = read.csv("trait2.csv");
allTraits = traitData[,];
femaleSamples = rownames(datExpr);
traitRows = match(femaleSamples, allTraits$CaseID);
datTraits = allTraits[traitRows,-1];
rownames(datTraits) = allTraits[traitRows,1];
sampleTree2 = hclust(dist(datExpr), method = "average")
traitColors = numbers2colors(datTraits, signed = FALSE);
powers = c(c(1:10), seq(from = 12, to=20, by=2))
sft = pickSoftThreshold(datExpr, powerVector = powers, verbose = 5)
text(sft$fitIndices[,1], -sign(sft$fitIndices[,3])*sft$fitIndices[,2],
text(sft$fitIndices[,1], sft$fitIndices[,5], labels=powers, cex=cex1,col="red")
softPower = 14;
adjacency = adjacency(datExpr, power = softPower);
TOM = TOMsimilarity(adjacency);
dissTOM = 1-TOM
geneTree = hclust(as.dist(dissTOM), method = "average");
minModuleSize = 30;
dynamicMods = cutreeDynamic(dendro = geneTree, distM = dissTOM,
                            deepSplit = 2, pamRespectsDendro = FALSE,
                            minClusterSize = minModuleSize);
mods<- table(dynamicMods)
write.csv (mods, "modules.csv")
dynamicColors = labels2colors(dynamicMods)
MEList = moduleEigengenes(datExpr, colors = dynamicColors)
MEs = MEList$eigengenes
MEDiss = 1-cor(MEs);
METree = hclust(as.dist(MEDiss), method = "average");
MEDissThres = 0.25
abline(h=MEDissThres, col = "red")
merge = mergeCloseModules(datExpr, dynamicColors, cutHeight = MEDissThres, verbose = 3)
mergedColors = merge$colors;
mergedMEs = merge$newMEs;
moduleColors = mergedColors
colorOrder = c("grey", standardColors(50));
moduleLabels = match(moduleColors, colorOrder)-1;
MEs = mergedMEs;
nGenes = ncol(datExpr);
nSamples = nrow(datExpr);
MEs0 = moduleEigengenes(datExpr, moduleColors)$eigengenes
MEs = orderMEs(MEs0)
moduleTraitCor = cor(MEs, datTraits, use = "p");
moduleTraitPvalue = corPvalueStudent(moduleTraitCor, nSamples);
textMatrix =  paste(signif(moduleTraitCor, 2), "\n(",
                    signif(moduleTraitPvalue, 1), ")", sep = "");
dim(textMatrix) = dim(moduleTraitCor)
par(mar = c(6, 8.5, 3, 3));
# Display the correlation values within a heatmap plot
labeledHeatmap(Matrix = moduleTraitCor,
               xLabels = names(datTraits),
               yLabels = names(MEs),
               ySymbols = names(MEs),
               colorLabels = FALSE,
               colors = greenWhiteRed(50),
               textMatrix = textMatrix,
               setStdMargins = FALSE,
               cex.text = 0.5,
               zlim = c(-1,1),
               main = paste("Module-trait relationships"))
Menopausestatus =$Menopausestatus);

names(Menopausestatus) = "Menopausestatus"
modNames = substring(names(MEs), 3)

geneModuleMembership =, MEs, use = "p"));
MMPvalue =, nSamples));

names(geneModuleMembership) = paste("MM", modNames, sep="");
names(MMPvalue) = paste("p.MM", modNames, sep="");

geneTraitSignificance =, Menopausestatus, use = "p"));
GSPvalue =, nSamples));

names(geneTraitSignificance) = paste("GS.", names(Menopausestatus), sep="");
names(GSPvalue) = paste("p.GS.", names(Menopausestatus), sep="");
module = "pink"
column = match(module, modNames);
moduleGenes = moduleColors==module;
ADD REPLYlink modified 21 months ago • written 21 months ago by shivangi.agarwal80070

you can just add 0 or 1 as tumour status

ADD REPLYlink written 9 months ago by krushnach80810

old question but its more related to R ,I get module plus the tumour status as one data frame , " data=data" this is the total dataframe .isnt it?

ADD REPLYlink written 7 months ago by krushnach80810
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1825 users visited in the last hour