I am working with an RNA-seq data set from the environment, so it's a mixed community with over 200,000 contigs assembled. I don't have many people to discuss my methods with and it would be very helpful to get your feedback on how I did this analysis.
For the WGCNA analysis, instead of working on the individual contig/gene level, I aggregate to the KEGG ID level and now have a list of about 1,500 IDs. This creates a composite of all the different genera/species/strains and allows me to work on a functional level. Broadly, I am curious which processes are more likely associated with certain environmental conditions, and the analysis seems to have worked out for this purpose.
I followed the manual pretty closely, and did not adjust p values. I merged modules and only have 4 - I am ok with this because otherwise, the modules were showing very similar correlations with traits. I wanted to group these together and minimize redundancy.
My first question is if the module membership p value (unadjusted) is not significant for some of the genes of interest (although many are). Can/should you still discuss these genes? I interpret this to mean that the correlation wasn't robust enough to be statistically significant, although the gene still clustered in this module and not in the "gray" module. Some of these non-significant genes, clustering along with others that were significant in that module, are showing an interesting pattern that is consistent with the literature.
Second question - is it routine to adjust all of the wgcna p values? Why is the default to not do this then? I'm having trouble figuring out when you absolutely should do this (to follow wgcna best practices), and when it doesn't make sense to based on the complications of your less than ideal and messy environmental data set.
It's difficult making these calls and so I would be grateful for your feedback.