Question: RNA Seq data for WGCNA
gravatar for vrrani
3.1 years ago by
vrrani10 wrote:


I am trying to find correlation between genes and fold change values of RNA seq data, using WGCNA. Module identification is my motivation. In this process I learnt that the minimum sample size for WGCNA is 15 and it is highly expected to get noise if it is less than 15. My final filtered data set is 1000 * 4 (1000 - genes and 4 - logFC value).


  1. Can I still use WGCNA, as my focus is only modules identification.
  2. If noise is must, what extent it would be biasing the results.
  3. Is it completely wrong to use WGCNA for this small sample size.

Please let me know.

Thanks in advance.

Regards, Rani.

rna-seq • 2.8k views
ADD COMMENTlink modified 3.1 years ago by pixie@bioinfo1.4k • written 3.1 years ago by vrrani10

If the manual says that you should not use less that 15 samples then obviously you run with some biasing and also you are enforcing the model. However, your data is pretty solid and deep with less noise probably you could still find something. Just to be on the safer side. If you are trying to understand modules based on FC within genes that cluster together and still maintains your phenotype condition separation, you can also use methods like K-means clustering, especially when you have less number of samples. That way you will not risk the usage of a less number of samples as highlighted by WGCNA manual.

ADD REPLYlink written 3.1 years ago by ivivek_ngs5.0k

When you say

1000 * 4 (1000 - genes and 4 - logFC value).

Does it mean you have filtered by logFC? You shouldn't do that. See the FAQ of WGCNA. If you have just 4 samples, then the results will be spurious. WGCNA relays on correlations and you can have quite easily a high correlation with just 4 samples, that's why more samples are important.

ADD REPLYlink written 3.1 years ago by Lluís R.970

Yes, i filtered my data by logFC. I too observed the high correlation tendency with the out put. Is there any other way apart from traditional K-means, so that i can identify the modules based on logFC and also the modules make scene biologically.

ADD REPLYlink written 3.1 years ago by vrrani10

First, it doesn't make sense to look for biologically interesting modules and then restrict to those which have a similar change. First work with WGCNA all samples that have the same biological state (don't mix condition A and controls), Then evaluate if those modules are coherent between two conditions. Maybe a module is split in two in condition B, or two modules join to form a new one. And evaluate using gene set enrichment tools if they make sense biologically.

ADD REPLYlink written 3.1 years ago by Lluís R.970

WGCNA will not work or let's say give non significant results that should not be trusted. They will be totally not robust with the metrics provided for samples and genes. GO might still provide some information if proper clustering is followed based or them or if clustering based on gene modules done followed with module specific GO.

ADD REPLYlink modified 3.1 years ago • written 3.1 years ago by ivivek_ngs5.0k

Sorry but I think I did not understand your comment. Would you mind clarifying why do you say WGCNA will not work? Which metrics for samples and genes are you talking about ? By using gene set enrichment tools I am including gene ontologies.

ADD REPLYlink written 3.1 years ago by Lluís R.970

As far as OP is saying am thinking OP doesn't have in total 15 samples in total to start with. This is the reason am saying. However traditional GO narrowing to modules of GO categories that clusters genes is perfectly fine depending on the processes that they give. But making co expression modules of genes from WGCNA with less that 15 samples might be not very trustworthy. This is what I wanted to say. Now if the OP says that these 4 FC columns belong to more than 15 samples then it should not be a problem.

ADD REPLYlink written 3.1 years ago by ivivek_ngs5.0k
gravatar for pixie@bioinfo
3.1 years ago by
Université Paris, Saclay
pixie@bioinfo1.4k wrote:

You could go for a normal mutual-rank based correlation network or a pearson correlation network using a high threshold cutoff for the correlation values. You can then use any clustering algorithm/plugin in cytoscape to cluster your networks. Biological validations using functional/pathway enrichment tools can be used on the clusters to validate them functionally.

ADD COMMENTlink written 3.1 years ago by pixie@bioinfo1.4k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1136 users visited in the last hour