Question

Can/should you interpret non-significant genes that cluster into modules with WGCNA?

0

Entering edit mode

4.0 years ago

N15 ▴ 160

Hello everyone,

I am working with an RNA-seq data set from the environment, so it's a mixed community with over 200,000 contigs assembled. I don't have many people to discuss my methods with and it would be very helpful to get your feedback on how I did this analysis.

For the WGCNA analysis, instead of working on the individual contig/gene level, I aggregate to the KEGG ID level and now have a list of about 1,500 IDs. This creates a composite of all the different genera/species/strains and allows me to work on a functional level. Broadly, I am curious which processes are more likely associated with certain environmental conditions, and the analysis seems to have worked out for this purpose.

I followed the manual pretty closely, and did not adjust p values. I merged modules and only have 4 - I am ok with this because otherwise, the modules were showing very similar correlations with traits. I wanted to group these together and minimize redundancy.

My first question is if the module membership p value (unadjusted) is not significant for some of the genes of interest (although many are). Can/should you still discuss these genes? I interpret this to mean that the correlation wasn't robust enough to be statistically significant, although the gene still clustered in this module and not in the "gray" module. Some of these non-significant genes, clustering along with others that were significant in that module, are showing an interesting pattern that is consistent with the literature.

Second question - is it routine to adjust all of the wgcna p values? Why is the default to not do this then? I'm having trouble figuring out when you absolutely should do this (to follow wgcna best practices), and when it doesn't make sense to based on the complications of your less than ideal and messy environmental data set.

It's difficult making these calls and so I would be grateful for your feedback.

wgcna stats RNA-Seq • 1.6k views

ADD COMMENT • link 4.0 years ago by N15 ▴ 160

Kevin Blighe · Answer 1 · 2020-05-05

1

Entering edit mode

4.0 years ago

andres.firrincieli 3.6k

Hi

Broadly, I am curious which processes are more likely associated with certain environmental conditions, and the analysis seems to have worked out for this purpose.

Since you are working with a meta-transcriptome It would be interesting to look at the taxa that mostly contribute to the 'expression' of each KO (kegg orthologues); e.g. for a given KO, which taxa-specific transcripts significantly changes across your conditions? or for a given KO, which taxa-specific transcripts significantly correlate with your environmental variables?

My first question is if the module membership p value (unadjusted) is not significant for some of the genes of interest (although many are). Can/should you still discuss these genes?

The kME (module membership) is the correlation (not p-value) between the 'expression' of your KO with the 1st PC (principal component. i.e. module eigengene) of the corresponding module. Therefore, you should look at the KO with the highest module memership (kME).

Second question - is it routine to adjust all of the wgcna p values?

Which p-values? The ones from the module-trait relationship?

ADD COMMENT • link 4.0 years ago by andres.firrincieli 3.6k

0

Entering edit mode

Hello and thank you for the reply!

Yes, one could absolutely take a look at taxa-specific gene expression using WGCNA. I am considering this for future analyses. For the present one, I was simply curious which metabolic processes were overtly different between regions, and therefore was sticking to all taxa within a certain phylum. The extra level of detail could be interesting regarding which taxa contributed to individual signals but I decided to keep it broad for the purposes of this analysis. Does this make sense? Also, 200,000 genes is nontrivial to handle and it seems that is not the norm for WGCNA. I wasn't able to build the network locally on my computer, although this could be done on a remote computer with more RAM.

The kME Pearson correlation is associated with a p value and both are given by WGCNA. For instance, I have a gene with a weak correlation is 0.1 to this module eigengene and a non-significant p value, but it was assigned to the same module as many other biologically related genes. Can you say anything about this gene? Or toss it out?

My second questions is for all of wgcna correlations - module/trait, and kME. Is it routine to adjust those p values?

ADD REPLY • link 4.0 years ago by N • 0

1

Entering edit mode

The extra level of detail could be interesting regarding which taxa contributed to individual signals but I decided to keep it broad for the purposes of this analysis. Does this make sense? Also, 200,000 genes is nontrivial to handle and it seems that is not the norm for WGCNA. I wasn't able to build the network locally on my computer, although this could be done on a remote computer with more RAM.

You do not need to build a network for the 200,000 transcripts. That is just too much. Since you already have the network, just look at the taxa that contributes to the 'expression' of the KOs with the highest kME (the hub of the modules).

Can you say anything about this gene? Or toss it out?

You shoudl toss it out. I would focus my analysis on the KO with the highest kME. However, I would consider the KO with low kME only if they are involved in the same pathway of the KO included in the hub.

My second questions is for all of wgcna correlations - module/trait, and kME. Is it routine to adjust those p values?

It is not part of the WGCNA pipleine, so I do not think is necessary.

One question. signedKME is the function used to calculate the kME. Just out of curiosity, how did you get the p-value?

ADD REPLY • link 4.0 years ago by andres.firrincieli 3.6k

0

Entering edit mode

Ok thank you so much! So helpful to get another person's take on this.

I actually did not use signedKME to get module membership. I followed the manual:

for (mod in 1:ncol(geneModuleMembership))
{
  oldNames = names(geneInfo0)
  geneInfo0 = data.frame(geneInfo0, geneModuleMembership[, modOrder[mod]],
                         MMPvalue[, modOrder[mod]]);
  names(geneInfo0) = c(oldNames, paste("MM.", modNames[modOrder[mod]], sep=""),
                       paste("p.MM.", modNames[modOrder[mod]], sep=""))
}

ADD REPLY • link updated 4.0 years ago by Kevin Blighe 88k • written 4.0 years ago by N • 0

0

Entering edit mode

That sounds!

Have fun

ADD REPLY • link 4.0 years ago by andres.firrincieli 3.6k