Question: WGCNA code for chooseTopHubInEachModule
0
gravatar for krushnach80
2.6 years ago by
krushnach80810
krushnach80810 wrote:

I m using WGCNA for co-expression network im trying to find the hubgene finding fucntion but im getting error im not sure what im doing wrong.

library(WGCNA)
library(flashClust)
enableWGCNAThreads()
options(stringsAsFactors = FALSE);
countdata <- read.csv('UP_DOWN_WGCNA.csv', header=TRUE)
dim(countdata)
names(countdata)
femData <- countdata
names(femData)
datExpr = as.data.frame(t(femData[, -c(1)]))
View(datExpr)
dim(datExpr)
names(datExpr) = femData$gene
View(datExpr)
rownames(datExpr) = names(femData)[-c(1)]
dim(datExpr)
powers = c(c(1:10), seq(from = 12, to=20, by=2))
#powers = c(1:20)  # in practice this should include powers up to 20.

sft=pickSoftThreshold(datExpr,dataIsExpr = TRUE,powerVector = powers,corFnc = cor,corOptions = list(use = 'p'),networkType = "signed")
sizeGrWindow(9, 7)
par(mfrow = c(1,2))
cex1 = 0.9


plot(sft$fitIndices[,1], -sign(sft$fitIndices[,3])*sft$fitIndices[,2],xlab="Soft Threshold (power)",ylab="Scale Free Topology Model Fit, signed R^2",type="n", main = paste("Scale independence"))
text(sft$fitIndices[,1], -sign(sft$fitIndices[,3])*sft$fitIndices[,2],labels=powers,cex=cex1,col="red")

# Red line corresponds to using an R^2 cut-off
abline(h=0.1,col="red")

# Mean connectivity as a function of the soft-thresholding power
plot(sft$fitIndices[,1], sft$fitIndices[,5],xlab="Soft Threshold (power)",ylab="Mean Connectivity", type="n",main = paste("Mean connectivity"))
text(sft$fitIndices[,1], sft$fitIndices[,5], labels=powers, cex=cex1,col="red")

softPower = 19

adj= adjacency(datExpr,type = "unsigned", power = softPower)

#turn adjacency matrix into topological overlap to minimize the effects of noise and spurious associations
TOM=TOMsimilarityFromExpr(datExpr,networkType = "unsigned", TOMType = "unsigned", power = softPower)

colnames(TOM) =rownames(TOM) = datExpr
dissTOM=1-TOM

geneTree = flashClust(as.dist(dissTOM),method="average")
plot(geneTree, xlab="", sub="",cex=0.3)


minModuleSize = 20;
ds = 3
cutHeight = 0.99999

# Module identification using dynamic tree cut

dynamicMods = cutreeDynamic(dendro = geneTree,  method="tree", minClusterSize = minModuleSize);
dynamicMods = cutreeDynamic(dendro = geneTree, distM = dissTOM, method="hybrid", deepSplit = 2, pamRespectsDendro = FALSE, minClusterSize = minModuleSize);

#the following command gives the module labels and the size of each module. Lable 0 is reserved for unassigned genes
table(dynamicMods)
dynamicColors = labels2colors(dynamicMods)
table(dynamicColors)

plotDendroAndColors(geneTree, dynamicColors, "Dynamic Tree Cut", dendroLabels = FALSE, hang = 0.03, addGuide = TRUE, guideHang = 0.05, main = "Gene dendrogram and module colors")

restGenes= (dynamicColors != "red")
diss1=1-TOMsimilarityFromExpr(datExpr[,restGenes], power = softPower)
collectGarbage()


colnames(diss1) =rownames(diss1) = datExpr[restGenes]
hier1=flashClust(as.dist(diss1), method="average" )
plotDendroAndColors(hier1, dynamicColors[restGenes], "Dynamic Tree Cut", dendroLabels = FALSE, hang = 0.03, addGuide = TRUE, guideHang = 0.05, main = "Gene dendrogram and module colors")
diag(diss1) = NA
sizeGrWindow(7,7)
collectGarbage()
TOMplot(diss1, hier1, as.character(dynamicColors[restGenes]))

module_colors= setdiff(unique(dynamicColors), "grey")
for (color in module_colors){
  module=datExpr[which(dynamicColors==color)]
  write.table(module, paste("module_",color, ".txt",sep=""), sep="\t", row.names=TRUE, col.names=TRUE,quote=FALSE)
}

module.order <- unlist(tapply(1:ncol(datExpr),as.factor(dynamicColors),I))
m<-t(t(datExpr[,module.order])/apply(datExpr[,module.order],2,max))
heatmap(t(m),zlim=c(0,1),col= redWhiteGreen(256),Rowv=NA,Colv=NA,labRow=NA,scale="row",RowSideColors=dynamicColors[module.order])


MEList = moduleEigengenes(datExpr, colors = dynamicColors)
MEs = MEList$eigengenes
plotEigengeneNetworks(MEs, "", marDendro = c(0,4,1,2), marHeatmap = c(3,4,1,2))


#################################################
colnames(TOM) =rownames(TOM) = datExpr
tree1 <- tree2 <- fastcluster::hclust(as.dist(1-TOM),method="average")
colorh = labels2colors(datExpr)
hubs    = chooseTopHubInEachModule(datExpr, colorh)
hubs


Warning message:
In labels2colors(datExpr) :
  labels2colors: Number of labels exceeds number of avilable colors. Some colors will be repeated 1 times.
> hubs    = chooseTopHubInEachModule(datExpr, colorh)
Error in `[.data.frame`(datExpr, , colorh == m) : 
  undefined columns selected

Im getting this error

Warning message: In labels2colors(datExpr) : labels2colors: Number of labels exceeds number of avilable colors. Some colors will be repeated 1 times.

hubs = chooseTopHubInEachModule(datExpr, colorh) Error in [.data.frame(datExpr, , colorh == m) : undefined columns selected

Any help or suggestion would be highly appreciated , am I would like to know if my script is correct or not ..

rna-seq R • 2.4k views
ADD COMMENTlink modified 2.6 years ago by RamRS27k • written 2.6 years ago by krushnach80810
1
gravatar for Kevin Blighe
2.6 years ago by
Kevin Blighe61k
University College London
Kevin Blighe61k wrote:

What are the contents of datExpr?

I presume that you have followed the tutorial by Steve Horvath?

ADD COMMENTlink written 2.6 years ago by Kevin Blighe61k

yes i followed the tutorial ,and "datExpr" contains my gene expression values except gene name ...

ADD REPLYlink written 2.6 years ago by krushnach80810
1

Okay, great! What's the output of labels2colors(datExpr)?

ADD REPLYlink written 2.6 years ago by Kevin Blighe61k

this is the message i get

Warning message: In labels2colors(datExpr) : labels2colors: Number of labels exceeds number of avilable colors. Some colors will be repeated 1 times.
ADD REPLYlink written 2.6 years ago by krushnach80810

Ah, I think that I see what's happening...

WGCNA may have identified just too many modules in your data, i.e., more modules that there are colours (by name) in the R palette! WGCNA, a you know, automatically assigns colour names to each palette.

Any way to see if that is the case? - how many modules were identified?

ADD REPLYlink written 2.6 years ago by Kevin Blighe61k

41 modules I get should i increase minModuleize which i set to 20 if i increase would it help?

ADD REPLYlink written 2.6 years ago by krushnach80810

No, not so sure that's the issue.

I think that you need to change this command:

colorh = labels2colors(datExpr)

With that, you may be assigning colours to every value in the entire datExpr object, which would explain why it complains that there are not enough available colours.

That command should just be applied to the module IDs/names, such as:

colorh = labels2colors(rownames(datExpr))

or

colorh = labels2colors(colnames(datExpr))
ADD REPLYlink written 2.6 years ago by Kevin Blighe61k

i increase minModulesize to 30 now i get like 17 module but still im getting the same issue im not sure what to do ..but i will try your suggestion

ADD REPLYlink written 2.6 years ago by krushnach80810

may be you can suggest me how to execute the chooseTopHubInEachModule argument

ADD REPLYlink written 2.6 years ago by krushnach80810
1

No, I'm highly confident that the problem relates to:

colorh = labels2colors(datExpr)

You are attempting to assign a unique colour to all values in datExpr. You only want to assign colours to your identified modules.

Take a look gain at the WGCNA tutorial:

moduleLabels = net$colors;
#Convert the numeric labels to color labels
moduleColors = labels2colors(moduleLabels)

[source: https://labs.genetics.ucla.edu/horvath/CoexpressionNetwork/Rpackages/WGCNA/Tutorials/Consensus-NetworkConstruction-auto.pdf - page 5]

This, then, causes the error in chooseTopHubInEachModule

ADD REPLYlink written 2.6 years ago by Kevin Blighe61k
1

Also look up in your own code where you previously used the same function:

dynamicColors = labels2colors(dynamicMods)

Always check the contents of your objects so that you understand in which contexts they are to be used.

ADD REPLYlink written 2.6 years ago by Kevin Blighe61k
1

thank you very much problem solved ,it takes a lot of time for me to go through various resources over the internet then compile and run the code and people like you are of really great help

ADD REPLYlink written 2.6 years ago by krushnach80810
1

Absolutely no problem - glad to have helped out.

When I have time, I go through tutorials like WGCNA line by line and make my own comments about what is happening. I then save the script for years later

ADD REPLYlink written 2.6 years ago by Kevin Blighe61k

yes assign colour to the module

ADD REPLYlink written 2.6 years ago by krushnach80810

yes assign colour to the module

ADD REPLYlink written 2.6 years ago by krushnach80810
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1501 users visited in the last hour