Community detection protocol and failed gene enrichment
0
0
Entering edit mode
14 months ago
k0stasmp ▴ 10

My data is a file of about 19000 genes from a 100 patients. I tried to use these data to create a network by using igraph.

Firstly, I had all the names of the genes converted to ENTREZID and from the 19000 genes I kept around 14000. Then I had discarded all the genes with zero variance and the final number is around 9000 genes.

Its a lot of code to append in detail but the initial steps I performed are roughly the following:

1) M2 <- graph_from_data_frame(d = mydf[,c("Node_A","Node_B","weight")], vertices = sort(unique(unlist(mydf))))

2) G2 <- igraph::delete.vertices(M2, igraph::graph.strength(M2)==0)

3) M2 <- induced_subgraph( G2, V(G2)[components(G2)$membership == which.max(components(G2)$csize)])

4) M2.subgraph <- mst(M2, algorithm="prim")

5) M2.subgraph.communities <- cluster_louvain(as.undirected(M2.subgraph), weights = E(M2.subgraph)\$weight)

After community detection I used python's SelectKBest() function to correlate genes to traits, I found the communities which include the largest number of the most correlated genes, and in the top three communities I used Kleinberg's score to detect the top genes. I used these genes for GO and KEGG enrichment.

But I notice that something is wrong: louvain returns around 200 communities and walktrap more than 1500!! The worst part of all is that KEGG and GO enrichment are always zero no matter the thresholds I have used!!

I haven't tried the wgcna library yet but I was wondering what might have missed with the above steps!

Below I have included a link to a sample of my data to whoever is interested: https://drive.google.com/file/d/1MDwcu0Xk-A3uWW8MR_YLCvTKAy8uGFSr/view?usp=sharing

Thanks guys,

Kostas

igraph detection community • 291 views