Modifying Edge Weights in WGCNA
0
0
Entering edit mode
3.5 years ago

I am creating a gene expression network and this post is mainly to have someone check my methodology but also I have a question regarding the edge weights that are created using a soft threshold.

My plan is to use gene expression data to create a similarity matrix using Pearson's Correlation. Then to create a weighted adjacency matrix, and finally a weighted topology overlap matrix (note: this is the TOM for similarity). I believe the typical thing to do now is to create a TOM for dissimilarity and use that as an input for clustering, but I am going to skip this step. As I understand it, now I should have a weighted TOM with values between 0-1 which can be used to create a network using graph_from_adjacency_matrix from the igraph package. I know the TOM isn't technically an adjacency matrix but more like a 'polished' version which might still be used to create the network.

My question is that if this signed, weighted TOM network is valid, do I need to normalize the edge weights? is there anything I need to do or this should be ready to go "out of the box"? I've read some papers that discuss transforming or normalizing the weights using an inverse CDF, I'm not sure if I need to do that to the WGCNA edge weights or are they good as is.

WGCNA network R igraph • 2.9k views
0
Entering edit mode

You accepted my previous answer, here: C: Weighted Gene Expression Network Is this a follow-up question?

Your logic sounds fine, but you do not require anything from WGCNA if you are just going to use igraph. Your starting point just needs to be a data matrix of genes and samples. You are correct in implying that one does not require a dissimilarity correlation matrix - it actually makes more intuitive sense to leave it as similarity via the 'raw' Pearson correlation values (or Spearman or Kendal if your dataset is more suited to non-parametric tests).

Keep in mind that, if you leave it as raw correlation values, then they will be distributed between -1 and +1, and that this is reflective of a signed network because we retain information about the direction of correlation. If you obtained the absolute correlation values, then it would be unsigned.

You can feed this into igraph using graph.adjacency, as I do in my tutorial: Network plot from expression data in R using igraph

You do not have to further weight the edges, but you an if you wish. You can also shade negative correlations one colour, and positive another. You can modify edge thickness based on the absolute correlation value, if you wish.

Again as you can see, entirely flexible, and no right or wrong.

0
Entering edit mode

Yes, this is a follow up question I guess, I learned a little bit since the last. Thanks for your reply on that.

So here's the thing, I definitely realize how flexible network construction can be, but there is some confusion. The pipeline WGCNA follows is correlation matrix -> adjacency matrix -> TOM similarity -> TOM dissimilarity, but you can actually create your network from any one of those steps. The idea is that with each step, your results are supposed to improve, be more robust. My plan is to feed the TOM similarity matrix into the graph.adjacency function, to create an igraph object, which I was able to do successfully. I guess my question is, just because the code runs doesn't mean the result is going to mean anything...most people create networks out of correlation matrices or adjacency matrices but has anyone tried the TOM similarity as input to create an igraph and how have the results looked.

You're right I don't need WGCNA to create correlation and adjacency matrices but the TOM is specific and I decided to keep it all consistent. Also I looked through your tutorial and you take the absolute value of the weights, why is that?

0
Entering edit mode

Please use ADD REPLY/ADD COMMENT when responding to existing posts to keep threads logically organized.

0
Entering edit mode

Yeah my mistake, couldn't figure out how to fix it.

1
Entering edit mode

Yes, and it is that diversity (of ways to build networks) that is problematic for anyone just entering the network analysis field. I think that this is also why people naturally default to just following the WGCNA pipeline, i.e., because the tutorial is pretty good, relatively speaking, and it is widely known. My former supervisor in Boston even gives lecturers on it.

Note that there are ready-made WGCNA-to-igraph functions out there - just use a search engine. There are also entire pipelines for linking both programs, like This

In my tutorial, one cannot have a negative value for an edge. So, first, I shade the negative correlation edges as blue in order to reflect negative correlation, and positive correlations red. If you move forward and actually use a negative value, the edge will just not be generated.

Network analysis is network analysis... it has not lived up to the hype. It is mostly a purely in silico analysis and results can vary by just tweaking a few parameters.

0
Entering edit mode

Dear Kevin, didnt want to create another post for that. i comment here to your "Network analysis is network analysis". How do you see merging co-expression (e.g. with WGCNA correlation matrix) and gene regulatory networks (e.g. GENIE3), both filtered for a specific cutoff (first case correlation value, in the second the number of targets for each TF). thanks for any reply

0
Entering edit mode

Hey, this thread is very old. I have no frame of reference for that procedure, unfortunately. Some kind of manual merge may be possible. Do you definitively have to merge the networks or could you simply use the information provided by each separate network?

0
Entering edit mode

(1) My idea is to merge this information using GCN as the base (here the doubt/question: should i keep the edges in the GRN that are already present in my GCN or should i also add extra edges from GRN to GCN). In any case i will have a mixed of directed and undirected that i will use for gene discovery and define causal relationships. I've deeply screened the literature about it i've found 0 = apparently nobody did it.
(2) As far as i've seen, with the visualization on igraph there will be no problem in showing the mix of directed and undirected + module identification that i will do with WGCNA (or any other clustering alghorithm). Can you confirm it?
(3) As alternative to Genie3 in case i'd like to add extra information from PPI (e.g. STRING database) and Chip-Seq would you just compile it and add to the mixed network or have you already used programs such as CoRegNet cMonkey2 and Merlin+P (mostly GRN tools) which are able to add this extra information in a form of extra-network to better define regulatory relationships?

1
Entering edit mode

(1) My idea is to merge this information using GCN as the base (here the doubt/question: should i keep the edges in the GRN that are already present in my GCN or should i also add extra edges from GRN to GCN). In any case i will have a mixed of directed and undirected that i will use for gene discovery and define causal relationships. I've deeply screened the literature about it i've found 0 = apparently nobody did it.

I do not know. Your educated guess is as good as mine. I would check the overlap of the edges. Also, depending on how you constructed both the GCN and the GRN, one may be more reliable than the other. That would be the GRN, right?

(2) As far as i've seen, with the visualization on igraph there will be no problem in showing the mix of directed and undirected + module identification that i will do with WGCNA (or any other clustering alghorithm). Can you confirm it?

I have never done this but I am confident that there is a way. You could just shade the vertices (belonging to different modules) based on a different colour. If you go to my tutorial, you will see how you can manipulate the attributes of the vertices.

(3) As alternative to Genie3 in case i'd like to add extra information from PPI (e.g. STRING database) and Chip-Seq would you just compile it and add to the mixed network or have you already used programs such as CoRegNet cMonkey2 and Merlin+P (mostly GRN tools) which are able to add this extra information in a form of extra-network to better define regulatory relationships?

I have not used these. Sorry!

0
Entering edit mode

I do not know. Your educated guess is as good as mine. I would check the overlap of the edges. Also, depending on how you constructed both the GCN and the GRN, one may be more reliable than the other. That would be the GRN, right?

honestly i have no idea on my data cause unfortunately im working with a very non-model organism :) but it has been shown that all these either clustering (WGCNA) and direct network inference (NI) methods can provide good results in terms of module detection (with the former performing better than the latter).

I have never done this but I am confident that there is a way. You could just shade the vertices (belonging to different modules) based on a different colour. If you go to my tutorial, you will see how you can manipulate the attributes of the vertices.

Thanks i'll have a look.