Question: Create Gene Network analysis in R
3
gravatar for sequence_hard
4.0 years ago by
Sweden
sequence_hard30 wrote:

Hey everyone!

I have a dataframe in R, the columns being Gene names and the rows being Isolate names. The value in each column can be 1 or 0, 1 for "Gene is present in Isolate" and 0 for "Gene not present in Isolate". So all in all, the dataframe shows in one-zero fashion which Isolate has which genes.

Now I want to do a network analysis to see which genes are most likely to co-occur, assess the strength of their connection and so on.

In R, I have tried the following:

>library(igraph)
>library(network)
>library(sna)
>library(ndtv)

>Genematrix <- data.matrix(df)
>g <- network(Genematrix, directed=FALSE)
> summary(g)
>plot(g)

What I get from this is a network object with 236 vertices. But what I actually want is the Genes as vertices ( 21 columns), so I can see the clusters and connections between them. In many tutorials I have seen that I need and edge list and a node list. the edge list is i think what I get from >g <- network(Genematrix, directed=FALSE), but I don't know how to get the node list.

Can anyone explain how to solve my problem and what i have to do to get the network I want?

 

network R gene • 2.8k views
ADD COMMENTlink modified 3.8 years ago by Biostar ♦♦ 20 • written 4.0 years ago by sequence_hard30
2
gravatar for russhh
4.0 years ago by
russhh4.9k
UK, U. Glasgow
russhh4.9k wrote:

Cols are the genenames and you want a gene-gene adjacency matrix to feed into igraph/network etc. You could do this with an incidence matrix (rows=vertices, cols = edges) as well, and your current dataframe looks almost like an incidence matrix (but isn't, since some columns may have more than 2 entries, so don't represent edges; and indeed, the vertices you want in your graph are present in the columns)

The following should convert your dataframe into an adjacency matrix (the edge between vertices u and v being weighted by the number of isolates where they were both 1)

m <- as.matrix(df) # note that df is a base function in R, so isn't a very good variable name

adj.m <- t(m) %*% m

Note that adj.m has diagonal entries that are of no value in your analysis so I think you can do something like the following to get rid of them

diag(adj.m) <- 0

 

Russ

ADD COMMENTlink modified 4.0 years ago • written 4.0 years ago by russhh4.9k

Works, thank you! I'll read up on the why :-)

ADD REPLYlink written 4.0 years ago by sequence_hard30
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1212 users visited in the last hour