subgraph extraction based on the edges weights and graph connectivity
1
1
Entering edit mode
3.2 years ago
arronar ▴ 260

Hi.

Given a matrix that describes the edges' and their weights of a connected graph (see below) I want to extract a subgraph based on a threshold value x for the edges' weights. In literature, I read that one can search for the maximal x, such that the induced subgraph is connected. Since the initial graph is assumed connected, there must be a critical threshold x-critical that the extracted subgraph is connected for any x <= x-critical.

I was wondering how can this implemented in R. For example, my matrix (weights.matrix) looks like

| FROM | TO | WEIGHT |
| A    | B  | 0.0042 |
| A    | V  | 0.23   |
| G    | W  | 0.82   |
| ...  | ...| ...    |


and I'm creating the whole graph, by using the igraph package like

g <- igraph::graph_from_data_frame(weights.matrix, directed = TRUE)


Is there any way to check repeatedly using a different threshold value for the weights from min() to max() if the occurred graph is connected? I searched in google for such feature in igraph but couldn't find anything helpful. So any idea on how to implement this idea is welcome.

R graph subgraph extraction weight • 2.5k views
1
Entering edit mode
3.2 years ago
ejm32 ▴ 450

This will get you started, not sure that there is a more efficient way to approaching this:

library(igraph)

m <- expand.grid(LETTERS[1:10], LETTERS[1:10])
m$Weight <- runif(nrow(m), 0.01, max = 5) m <- m[m$Var1!=m$Var2, ] ##remove loop edges g <- graph_from_data_frame(m) plot(g, edge.curved=TRUE) g1 <- induced_subgraph(g, c("A","B","C")) plot(g1, edge.curved=TRUE) #E(g1)$Weight
for(xcrit in sort(E(g1)\$Weight)){
if(any(degree(subgraph.edges(g1, E(g1)[Weight > xcrit], delete.vertices = F))==0)){
print(paste0("Xcrit = ", xcrit))
break
}
}
#[1] "Xcrit = 4.5026016204874"

0
Entering edit mode

Thank you for your answer but it confused me a little bit. Why did you choose as initial nodes the A,B and C? For some reason, you chose the induced network to be constructed by those nodes. As I understood from the literature,the induced graph should be the result of the whole process. Why choose one from the beginning? Here is a pdf referring that technique in pdf's page 15 , section 5 the fourth paraagraph. You can consider the edge relevance as the edge weight discussed here.

2
Entering edit mode

What's written in this section of the paper is simply a thresholding of the adjacency relevance score matrix, i.e. set all values to 0 if they are below a threshold. The 4th paragraph proposes an automatic way of setting the threshold. If you're interested in the limited k-walk approach, I don't think it's available in igraph but I've implemented it in my perl module Algorithm::Graph (using C code from the authors). I haven't implemented the automatic threshold selection though but it is usually not difficult to find a reasonable threshold for a given data type. For example for large protein interaction graphs, I use a default of 0.0001

Edit: clarification

0
Entering edit mode

I don't think that the 4th paragraph is referring to thresholding the adjacency matrix. As I understood they propose to apply the threshold to the edge relevance scores. I have already implemented the k-walk algorithm and the results I got are in the form of the dataframe described in the OP. So now I need some help on how to implement this auto-thresholding. It's kinda confusing me. If you have any idea... :-)

What's this 0.0001? A threshold for the adjacency matrix or for the relevance scores? Here is how the distribution of my edge relevance scores looks like. You suggest applying 0.0001 to this?

1
Entering edit mode

Sorry about the confusion. I meant thresholding of relevance scores (which you can view as new edge weights). I've mostly used this for protein interaction graphs so I don't know if the default I use is applicable to other graphs. I usually define a threshold based on exploration of the returned graph, not just on the distribution of values. One way of approaching the problem of finding a threshold automatically could be to find the edges that if removed would disconnect the graph (they are called bridges and I think there's an algorithm for finding them).

0
Entering edit mode

One way of approaching the problem of finding a threshold automatically could be to find the edges that if removed would disconnect the graph (they are called bridges and I think there's an algorithm for finding them).

Isn't what the fourth paragraph is talking about? Also based on what you chose the 0.0001 for protein-protein networks?

0
Entering edit mode

Yes to the first question. Looking for the smallest value associated with bridges will give you the threshold. My value selection is based on inspection of the graphs returned by the algorithm for different thresholds.

0
Entering edit mode

What kind of inspection? Are you looking for graphs with a specific number of expanded nodes or anything different?

0
Entering edit mode

- when querying with components of some known complexes or set of interactions, how many of the other known components do we recover in the subgraph ?
- the subgraphs should be of manageable size

You can have different criteria depending on what the graph represents and what is important for you.

0
Entering edit mode

I understood your question to be: "Given a set of nodes in a graph, I want to find the minimum score at which the these nodes are fully connected (i.e. an induced subgraph)." I will look at the paper and edit my answer accordingly.

0
Entering edit mode

I see... Sorry for letting this misconception.