WGCNA consensus for 3 datasets
1
0
Entering edit mode
2.3 years ago
jms2520 ▴ 30

Is it possible to run a consensus analysis in WGCNA for 3 different datasets?

I have a large RNA-seq dataset that I am trying to run WGCNA on. The data set has many variables including (including 3 brain regions, 3 ages, sex, 2 genotypes.) When running the analysis on the entire data set as a whole pretty much everything was driven by the major differences between brain region. I am more interested in parameters such as sex and genotype so I decided to take a different approach and separate the data into 3 groups based on region and run a consensus analysis. Though I am not sure if this is mathematically possible.

RNAseq WGCNA • 4.1k views
ADD COMMENT
0
Entering edit mode
2.3 years ago

You can create a consensus TOM from three different dataset using the function consensusTOM. This function has a lot of arguments so read carefully the Details section

edit: link to consensusTOM

ADD COMMENT
0
Entering edit mode

Does this mean I cannot use the regular suggested network construction?

net = blockwiseConsensusModules(
multiExpr, power = 6, minModuleSize = 30, deepSplit = 2,
pamRespectsDendro = FALSE,
mergeCutHeight = 0.25, numericLabels = TRUE,
minKMEtoStay = 0,
saveTOMs = TRUE, verbose = 5)

I have successfully loaded all three sets. Or would this be utilized during visualization instead of 1-TOMsimilarityFromExpr? I have been following tutorial #2 for consesus analysis (https://horvath.genetics.ucla.edu/html/CoexpressionNetwork/Rpackages/WGCNA/Tutorials/)

ADD REPLY
0
Entering edit mode

Does this mean I cannot use the regular suggested network construction?

If you have limited RAM, you can use blockwiseConsensusModules instead of consensusTOM. The output of consensusTOM is just a consensus TOM that must be converted into a dissimilarity matrix: disTOM = 1 - consensusTOM in order to proceed with the clustering and module identification

ADD REPLY
0
Entering edit mode

I have 32 gigs of RAM for 19,000 genes and it took a few hours for the blockwiseConsensusModules function to run. Eventually it seemed to work but I wasn't sure if this function was appropriate to use for 3 data sets.

ADD REPLY
0
Entering edit mode

32 is plenty of RAM for 19k genes. This function can be used on three or more data sets. Always check the description:

Perform network construction and consensus module detection across several datasets.

ADD REPLY
0
Entering edit mode

I suppose I am slightly confused the difference between blockwiseConsensusModules and consensusTOM and how they are used in context. Are they interchangeable? I assumed I needed to use blockwiseConsensusModules to actually construct the network?

Do I just use consensusTOM as I would for network construction instead of blockwiseConsensusModules and then proceed to use disTOM = 1 - consensusTOM as follows?

net = consensusTOM( multiExpr, power = 6, maxBlockSize = 19000, minModuleSize = 30, deepSplit = 2, pamRespectsDendro = FALSE, mergeCutHeight = 0.25, numericLabels = TRUE, minKMEtoStay = 0, saveTOMs = TRUE, verbose = 5)

dissTOM = 1-consensusTOM(multiExpr, power = 6);

ADD REPLY
0
Entering edit mode

I know it is confusing but keep in mind that in WGCNA the TOM is the network. The functions consensusTOM and blockwiseConsensusModules give you back a TOM but only blockwiseConsensusModules perform the clustering and the module detection.

This chunck of code is wrong:

net = consensusTOM( multiExpr, power = 6, maxBlockSize = 19000, minModuleSize = 30, deepSplit = 2, pamRespectsDendro = FALSE, mergeCutHeight = 0.25, numericLabels = TRUE, minKMEtoStay = 0, saveTOMs = TRUE, verbose = 5)

because minModuleSize, pamRespectsDendro, mergeCutHeight are not arguments of the function consensusTOM

Please, read the description of consensusTOM and blockwiseConsensusModules

Calculation of consensus Topological Overlap with consensusTOM

conTOM = consensusTOM(multiExpr, power = 6) # this is just an example 

Clustering and module identification

disTOM = 1-conTOM
consTree = hclust(as.dist(disTOM), method = "average");
unmergedLabels = cutreeDynamic(dendro = consTree, distM = disTOM, deepSplit = 2, cutHeight = 0.995, minClusterSize = minModuleSize, pamRespectsDendro = FALSE ); 
unmergedColors = labels2colors(unmergedLabels)

From here you can proceed with 2.a.7

ADD REPLY
0
Entering edit mode

So what would be the benefit of using consensusTOM vs blockwiseConsensusModules when essentially it seems that they are doing the same thing except consensusTOM seems to take the additional steps to cluster and identify the modules? If blockwiseConsensusModules can be used to achieve the same thing in one step and use less memory why is this not the preferred method?

Additionally, following the step-wise network construction- it looks like there is a whole part of the tutorial where there is "Scaling of Topological Overlap Matrices to make them comparable across sets"- is this something that can also be adapted to be used for three data-sets?

ADD REPLY
0
Entering edit mode

I went to the step-wise network construction tutorial and followed each step until the consensusTOM and my computer crashed at the following line because it ran out of ram (it crashed while at 30 gigs)

# Initialize an appropriate array to hold the TOMs
TOM = array(0, dim = c(nSets, nGenes, nGenes));
for (set in 1:nSets)
 TOM[set, , ] = TOMsimilarity(adjacencies[set, , ]);
ADD REPLY
0
Entering edit mode

This is strange because you managed to get a consensusTOM with blockwiseConsensusModules without any problem. Is that right?

Can you post the command line of the functions consensusTOM with blockwiseConsensusModules?

ADD REPLY
0
Entering edit mode

I started trying to follow the step-wise network construction tutorial and ran:

softPower = 6;
adjacencies = array(0, dim = c(nSets, nGenes, nGenes));
 for (set in 1:nSets)
   adjacencies[set, , ] = abs(cor(multiExpr[[set]]$data, use = "p"))^softPower;

TOM = array(0, dim = c(nSets, nGenes, nGenes));
for (set in 1:nSets)

TOM[set, , ] = TOMsimilarity(adjacencies[set, , ]);

But did not continue with this strategy because R-studio kept aborting and after checking my computer's memory usage it used 30 gigs of RAM before aborting. So I totally removed the function consensusTOM because my computer couldn't get past that line without crashing. After loading all of my data I determined the the soft thresholding I ran the blockwiseConsensusModules

enableWGCNAThreads()
nSets = checkSets(multiExpr)$nSets
powerTables = vector(mode = "list", length = nSets);
for (set in 1:nSets)
  powerTables[[set]] = list(data = pickSoftThreshold(multiExpr[[set]]$data, powerVector=powers,
                                                     verbose = 2)[[2]]);
##skipping all of the plotting so this post isn't too long
net = blockwiseConsensusModules(
  multiExpr, power = 6, minModuleSize = 30, deepSplit = 2, maxBlockSize = 19000,
  pamRespectsDendro = FALSE,
  mergeCutHeight = 0.25, numericLabels = TRUE,
  minKMEtoStay = 0,
  saveTOMs = TRUE, verbose = 5)

Calculating consensus modules and module eigengenes block-wise from all genes
 Calculating topological overlaps block-wise from all genes
   Flagging genes and samples with too many missing values...
    ..step 1
     ..bad gene count: 0, bad sample counts: 0, 0, 0
 ....Working on set 1
    TOM calculation: adjacency..
    ..will use 7 parallel threads.
     Fraction of slow calculations: 0.000000
    ..connectivity..
    ..matrix multiplication (system BLAS)..
    ..normalization..
    ..done.

Then it loops and does this for all three sets

ADD REPLY
0
Entering edit mode

In the the first approach step-wise network construction R-studio run out of memory because you have 6 matrices (3 adjacencies and 3 TOMs).

It doesn't matter at this point because blockwiseConsensusModules with maxBlockSize = 19000 should give you back the same consensus TOM of:

conTOM = consensusTOM(multiExpr, power = 6, maxBlockSize = 19000)
ADD REPLY
0
Entering edit mode

Thank you so much for all of your help. I just have one follow up question: One thing that seemed interesting about the step-wise network construction is the fact that there was the opportunity to scale the TOMs. When calculating the soft thresholding I get the following graphs. Clearly the Cerebellum and the S1 have similar "fits" but it looks like the hippocampus has lower power. I chose a power of 6 because it is where they all seem to meet. Do you think this data needs be scaled to account for any bias from that third data set? There doesn't seem to be any outliers in that data when looking at the dendrogram

enter image description here

ADD REPLY
0
Entering edit mode

You have two scaling options in blockwiseConsensusModules: single quantile and full quantile. You select the scaling with the argument networkCalibration. Read the detail to understand what each scaling does.

About the power… you said you do not have any evident outlier, then you have 2 huge problems here.

First, the mean connectivity in S1 and C is too high no matter which power you pick. For each of these datasets, you will get one huge cluster of genes up and down-regulated between males and females. The results of a consensus TOM between S1 and C is a single cluster where everything is connected (not even close to a scale-free topology) -> hairball -> not interesting. Second, the hippocampus is rather flat and probably you do not have any expression pattern. So you might have 2 problems 1) no variability in the hippocampus sample; 2) you have a strong driver of variation, probably sex, in S1 and C samples

May I ask you where did you get these data and how were normalized?

ADD REPLY
0
Entering edit mode

My lab ran a time course RNA seq on bulk tissue from our novel Knock out mouse model and we collected 4 samples from each group (including 3 brain regions, 3 ages, sex, 2 genotypes.) The RNA seq was analyzed by a data science core facility using limma voom. The input I used is the log transformed normalized data

ADD REPLY
0
Entering edit mode

First of all, ask for the raw expression data.

Second, I am not familiar with the limma-voom normalization but if it try to model the variance according to the experimental design this would explain why you get these huge single clusters showing opposite expression trends between brain regions (first network approach) or sex (consensus network). You could try DESeq2 varianceStabilizingTransformation with blind = TRUE for normalization.

Finally, not every data set can analyzed with WGCNA. Given the limited information I have, maybe there is nothing wrong with your data. This is how the data behave and a differential expression analysis is perhaps the best strategy

Cheers

ADD REPLY
0
Entering edit mode

Limma voom does a number of things for normalization that could be problematic. 1) It creates a general linear model for the data which we liked because it helped to take into account the many variables. 2) since samples from different regions often come from the same mouse (i.e. replicate 1 from the cerebellum will often come from the same mouse as the replicate 1 from the hippocampus) so limma voom correlates samples from the same mouse. 3) The "voom" function adds weights to the samples based on how closely they cluster within a PCA

Perhaps if I use the raw expression data and use fpkm and log transform it? enter image description here enter image description here enter image description here

ADD REPLY
0
Entering edit mode

Perhaps if I use the raw expression data and use fpkm and log transform it?

Yes, you can log transform fpkm data and then proceed with WGCNA or better, use DESeq2 normalization.

One last thing, this post could really help you. See Peter Langfelder answers.

ADD REPLY
0
Entering edit mode

Thank you so much for all of your help! I will try out all of these suggestions!

ADD REPLY
0
Entering edit mode

hello, sorry to piggy back on this discussion but I'm also in the same situation with having three data sets. I want to see whether there are any networks preserved between A in a combination of BC. Is this a possible since the conventional way is to set two setLabels. How does your multiExpr data look like? Thanks

ADD REPLY
0
Entering edit mode

This is just an example

# A expression data are from column 1 to 27 in datExpr
indexA=c(1:27)
# BC expression data are from column 28 to 81 in datExpr
indexBC=c(28:81)
# load the expression variables in multiExpr
multiExpr = list();
multiExpr[[1]] = list(data = datExpr[indexA, ])
multiExpr[[2]] = list(data = datExpr[indexBC, ])
# For easier labeling of plots, create a vector holding descriptive names of the two sets.
setLabels = c("A", "BC")
names(multiExpr) = setLabels
lapply(multiExpr, lapply, dim)
ADD REPLY

Login before adding your answer.

Traffic: 2562 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6