I have RNASeq data that I am trying to use for WGCNA. However I am running into downstream issues that I believe stem from my trait file. I have 24 samples, 3 replicates of 8 different treatment groups. I am trying to follow the WGCNA tutorial but having to make minor adjustments because my treatment groups are qualitative, not quantitative. I did make each treatment group into a number, 1-8, instead of the treatment name. My input data file is "mat2" and my trait file is "Condition". Here is my code for my trait file upload:
CData = read.csv("~/Documents/Conditiondata.csv") dim(CData) names(CData) CData2 = CData[, -c(1)] #removing the first column names(CData2) dim(CData2)
Samples = rownames(mat2) conditionRows = match(Samples, CData2$Sample) Condition= CData2[conditionRows, ] rownames(Condition) = CData2[conditionRows, 1] collectGarbage() rownames(Condition) == rownames(mat2) #confirming that row names match, and this comes back as TRUE.
Once I get to the point of making the module-trait heatmap, I end up with coerced NAs for the samples. I've attached an image of what my resulting heatmap. Any help is greatly appreciated! Thank you!
The NAs are coming from the fact that each 'Sample' (which I assume is a unique identifier for a sample) is not actually a number and so there's no correlation.
However, I strongly recommend altering your approach here and using a binary matrix for these correlations. That is, assuming your CData looks something like
then using
cdata.bin <- model.matrix(~ treatment + 0, data=CData)
will give you the appropriate structure for the module trait heatmap. Right now "green" is telling you the module is "up in early groups" and red is telling you the module is "up in late groups", but you can't pinpoint specific ones. Breaking this all out into a 24x8 matrix (whichmodel.matrix
does for you) will give you far more detail.