I have been following the WGCNA tutorial by Peter Langfelder and Steve Horvath (https://horvath.genetics.ucla.edu/html/CoexpressionNetwork/Rpackages/WGCNA/Tutorials/) but with my own dataset however my final dendogram basically shows ALL the genes in my dataset are clustering together. What should I do or what parameters should affect this?
I am deciding to give an answer, as this will likely attract hits from search engines (based on your question title).
This frequently happens whereby most or all genes are assigned to a single module, and you will have to go back through each step to understand why. We cannot see any of your code, your input, or your output, so, we are not to know precisely where the issue may lie.
Some things at which to look and on which to ponder:
- what is your input data? - input data should be normalised and, preferably, transformed to log (natural or base 2) or regularised log, or it should be variance-stabiled or converted to Z-scores
- is your input data too 'flat'? - check it in histograms, boxplots, and scatterplots. A person with OCPD (Obsessive Compulsive Personality Disorder.. different from OCD) will want a very neat dataset with all 'lumps' removed'; however, biology never works that way. In the act of making data too 'clean', one may inadvertently eliminate the very signal that one wishes to detect
- what is your sample n? - low sample n will be problematic
- review the output of all of your WGCNA commands - do not just run the commands blindly from start to finish
- ensure that you have chosen the correct soft threshold power
- review your tree cut height for merging modules
Hope that these guides help
Note the previous answer, where the user alluded to her input data and sample n as being the source of the problem: C: WGCNA- Large number of genes clustering under one Module