Question

Create Gene Network analysis in R

3

Entering edit mode

8.4 years ago

sequence_hard ▴ 30

Hey everyone!

I have a dataframe in R, the columns being Gene names and the rows being Isolate names. The value in each column can be 1 or 0, 1 for "Gene is present in Isolate" and 0 for "Gene not present in Isolate". So all in all, the dataframe shows in one-zero fashion which Isolate has which genes.

Now I want to do a network analysis to see which genes are most likely to co-occur, assess the strength of their connection and so on.

In R, I have tried the following:

>library(igraph)
>library(network)
>library(sna)
>library(ndtv)

>Genematrix <- data.matrix(df)
>g <- network(Genematrix, directed=FALSE)
> summary(g)
>plot(g)

What I get from this is a network object with 236 vertices. But what I actually want is the Genes as vertices ( 21 columns), so I can see the clusters and connections between them. In many tutorials I have seen that I need and edge list and a node list. the edge list is I think what I get from >g <- network(Genematrix, directed=FALSE), but I don't know how to get the node list.

Can anyone explain how to solve my problem and what I have to do to get the network I want?

gene network R • 4.3k views

ADD COMMENT • link updated 21 months ago by Ram 43k • written 8.4 years ago by sequence_hard ▴ 30

Ram · Accepted Answer · 2015-12-17

Cols are the genenames and you want a gene-gene adjacency matrix to feed into igraph/network etc. You could do this with an incidence matrix (rows=vertices, cols = edges) as well, and your current dataframe looks almost like an incidence matrix (but isn't, since some columns may have more than 2 entries, so don't represent edges; and indeed, the vertices you want in your graph are present in the columns)

The following should convert your dataframe into an adjacency matrix (the edge between vertices u and v being weighted by the number of isolates where they were both 1)

m <- as.matrix(df) # note that df is a base function in R, so isn't a very good variable name
adj.m <- t(m) %*% m

Note that adj.m has diagonal entries that are of no value in your analysis so I think you can do something like the following to get rid of them

diag(adj.m) <- 0

Russ