Question: subgraph extraction based on the edges weights and graph connectivity
1
gravatar for arronar
13 months ago by
arronar200
Austria
arronar200 wrote:

Hi.

Given a matrix that describes the edges' and their weights of a connected graph (see below) I want to extract a subgraph based on a threshold value x for the edges' weights. In literature, I read that one can search for the maximal x, such that the induced subgraph is connected. Since the initial graph is assumed connected, there must be a critical threshold x-critical that the extracted subgraph is connected for any x <= x-critical.

I was wondering how can this implemented in R. For example, my matrix (weights.matrix) looks like

| FROM | TO | WEIGHT |
| A    | B  | 0.0042 |
| A    | V  | 0.23   |
| G    | W  | 0.82   |
| ...  | ...| ...    |

and I'm creating the whole graph, by using the igraph package like

g <- igraph::graph_from_data_frame(weights.matrix, directed = TRUE)

Is there any way to check repeatedly using a different threshold value for the weights from min() to max() if the occurred graph is connected? I searched in google for such feature in igraph but couldn't find anything helpful. So any idea on how to implement this idea is welcome.

graph weight subgraph extraction R • 1.0k views
ADD COMMENTlink modified 13 months ago by ejm32440 • written 13 months ago by arronar200
1
gravatar for ejm32
13 months ago by
ejm32440
Boston, MA
ejm32440 wrote:

This will get you started, not sure that there is a more efficient way to approaching this:

library(igraph)

m <- expand.grid(LETTERS[1:10], LETTERS[1:10])
m$Weight <- runif(nrow(m), 0.01, max = 5)
m <- m[m$Var1!=m$Var2, ] ##remove loop edges

g <- graph_from_data_frame(m)
plot(g, edge.curved=TRUE)

g1 <- induced_subgraph(g, c("A","B","C"))
plot(g1, edge.curved=TRUE)
#E(g1)$Weight
for(xcrit in sort(E(g1)$Weight)){
  if(any(degree(subgraph.edges(g1, E(g1)[Weight > xcrit], delete.vertices = F))==0)){
    print(paste0("Xcrit = ", xcrit))
    break
    }
}
#[1] "Xcrit = 4.5026016204874"
ADD COMMENTlink written 13 months ago by ejm32440

Thank you for your answer but it confused me a little bit. Why did you choose as initial nodes the A,B and C? For some reason, you chose the induced network to be constructed by those nodes. As I understood from the literature,the induced graph should be the result of the whole process. Why choose one from the beginning? Here is a pdf referring that technique in pdf's page 15 , section 5 the fourth paraagraph. You can consider the edge relevance as the edge weight discussed here.

ADD REPLYlink modified 13 months ago • written 13 months ago by arronar200
2

What's written in this section of the paper is simply a thresholding of the adjacency relevance score matrix, i.e. set all values to 0 if they are below a threshold. The 4th paragraph proposes an automatic way of setting the threshold. If you're interested in the limited k-walk approach, I don't think it's available in igraph but I've implemented it in my perl module Algorithm::Graph (using C code from the authors). I haven't implemented the automatic threshold selection though but it is usually not difficult to find a reasonable threshold for a given data type. For example for large protein interaction graphs, I use a default of 0.0001

Edit: clarification

ADD REPLYlink modified 13 months ago • written 13 months ago by Jean-Karim Heriche20k

I don't think that the 4th paragraph is referring to thresholding the adjacency matrix. As I understood they propose to apply the threshold to the edge relevance scores. I have already implemented the k-walk algorithm and the results I got are in the form of the dataframe described in the OP. So now I need some help on how to implement this auto-thresholding. It's kinda confusing me. If you have any idea... :-)

What's this 0.0001? A threshold for the adjacency matrix or for the relevance scores? Here is how the distribution of my edge relevance scores looks like. You suggest applying 0.0001 to this?

enter image description here

ADD REPLYlink modified 13 months ago • written 13 months ago by arronar200
1

Sorry about the confusion. I meant thresholding of relevance scores (which you can view as new edge weights). I've mostly used this for protein interaction graphs so I don't know if the default I use is applicable to other graphs. I usually define a threshold based on exploration of the returned graph, not just on the distribution of values. One way of approaching the problem of finding a threshold automatically could be to find the edges that if removed would disconnect the graph (they are called bridges and I think there's an algorithm for finding them).

ADD REPLYlink written 13 months ago by Jean-Karim Heriche20k

One way of approaching the problem of finding a threshold automatically could be to find the edges that if removed would disconnect the graph (they are called bridges and I think there's an algorithm for finding them).

Isn't what the fourth paragraph is talking about? Also based on what you chose the 0.0001 for protein-protein networks?

ADD REPLYlink modified 13 months ago • written 13 months ago by arronar200

Yes to the first question. Looking for the smallest value associated with bridges will give you the threshold. My value selection is based on inspection of the graphs returned by the algorithm for different thresholds.

ADD REPLYlink written 13 months ago by Jean-Karim Heriche20k

What kind of inspection? Are you looking for graphs with a specific number of expanded nodes or anything different?

ADD REPLYlink written 13 months ago by arronar200

I had mainly two criteria:
- when querying with components of some known complexes or set of interactions, how many of the other known components do we recover in the subgraph ?
- the subgraphs should be of manageable size

You can have different criteria depending on what the graph represents and what is important for you.

ADD REPLYlink modified 13 months ago • written 13 months ago by Jean-Karim Heriche20k

I understood your question to be: "Given a set of nodes in a graph, I want to find the minimum score at which the these nodes are fully connected (i.e. an induced subgraph)." I will look at the paper and edit my answer accordingly.

ADD REPLYlink written 13 months ago by ejm32440

I see... Sorry for letting this misconception.

ADD REPLYlink written 13 months ago by arronar200
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1887 users visited in the last hour