Question

How to construct an adjacency matrix based on output of nearest neighbors search method from RANN package?

0

Entering edit mode

8 months ago

Raheleh ▴ 260

Hello everyone,

I'm currently working with a dataset comprising 496 cancer samples and my aim is to perform class discovery to identify potential clusters within the data. My plan involves constructing a graph using the igraph and subsequently employing Louvain clustering to uncover communities within the dataset.

To start, I've applied a nearest neighbors search using the RANN package to create an adjacency matrix. The outcome of this step is a list with two elements: nn.idx, which provides the nearest neighbor indices, and nn.dists, offering the corresponding Euclidean distances. Both matrices have dimensions of 496 x 10, representing the near neighbor information for each sample.

I'm seeking guidance on how to proceed from here to generate a square adjacency matrix suitable for building a graph object. I truly appreciate any assistance or insights you can provide.

Here's the script I've used for the nearest neighbors search:

neighbors <- RANN::nn2(df.dat.2)

This is a glimpse of how my data looks:

df.dat.2[1:6,1:5]
                Cytotoxic_lymphocytes   NK_cells IMMUNE_CD8MACRO_GALON IMMUNE_TREG_PASTILLE IMMUNE_TH1_GALON
DFR18201125_S2              1.0847794 -0.7873059             2.0384016            2.3400656        1.6245264
CMR18160811_S14             1.4107888  0.7359309             2.3173862           -0.6108772        1.1848030
AKNR18200612_S1             1.4906686  1.4865404             1.8352200            0.4802375        1.5588536
X211_S39                   -0.2367320 -0.4840597            -1.4358730            0.5417923       -1.1066911
X213_S41                   -0.6760709 -0.5416064            -2.3613827            0.1748922       -1.6711970
X214_S42                   -1.0435179 -0.4920662            -0.8881929           -0.6243585       -0.8318914

Thank you for your insights and suggestions in advance.

igraph RANN adjacency-matrix • 994 views

ADD COMMENT • link updated 8 months ago by bk11 ★ 2.4k • written 8 months ago by Raheleh ▴ 260

score 1 · Answer 1 · 2023-08-29

1

Entering edit mode

8 months ago

bk11 ★ 2.4k

library(igraph)

#Create a graph adjacency based on correlation distances between genes in  pairwise fashion.
g <- graph.adjacency(as.matrix(as.dist(cor(t(df.dat.2), method="pearson"))), mode="undirected", weighted=TRUE, diag=FALSE)

#If you want to use Euclidean distances instead of correlation,
g <- graph.adjacency(as.matrix(dist(MyData, method="euclidean")), mode="undirected", weighted=TRUE, diag=FALSE)

If you are working in single cell data, you could do something like below using pbmc3k data-

library(SingleCellExperiment)
library(Seurat)
library(scater)
library(scran)
library(igraph)
library(SeuratData)
library(pheatmap)
library(mclust)

#InstallData("pbmc3k")
pbmc <- LoadData("pbmc3k", type = "pbmc3k.final")

pbmc.sce <- as.SingleCellExperiment(pbmc.updated)
pbmc.sce

#create SNN graph
graph_k10 <- scran::buildSNNGraph(pbmc.sce, k = 10, use.dimred = "PCA", type = "rank")

#community detection in igraph using walktrap algorithm and Louvian method
clust_k10_walktrap <- igraph::cluster_walktrap(graph_k10)$membership
clust_k10_louvain <- igraph::cluster_louvain(graph_k10)$membership

table(clust_k10_walktrap)
table(clust_k10_louvain)

table(clust_k10_walktrap, clust_k10_louvain)

## Add cluster assignments to the SingleCellExperiment object and visualize in
## UMAP representation
pbmc.sce$cluster_walktrap_k10 <- factor(clust_k10_walktrap)
pbmc.sce$cluster_louvain_k10 <- factor(clust_k10_louvain)
scater::plotReducedDim(pbmc.sce, "UMAP", colour_by = "cluster_walktrap_k10")

scater::plotReducedDim(pbmc.sce, "UMAP", colour_by = "cluster_louvain_k10")

## Define a set of colors to use (must be at least as many as the number of
## communities)
cols <- RColorBrewer::brewer.pal(n = 12, name = "Paired")
## Plot the graph, color by cluster assignment
igraph::plot.igraph(
  graph_k10, layout = layout_with_fr(graph_k10),
  vertex.color = cols[clust_k10_walktrap],
  vertex.size = 5, vertex.label = NA, main = "Walktrap"
)

igraph::plot.igraph(
  graph_k10, layout = layout_with_fr(graph_k10),
  vertex.color = cols[clust_k10_louvain],
  vertex.size = 5, vertex.label = NA, main = "Louvain"
)

ADD COMMENT • link 8 months ago by bk11 ★ 2.4k

0

Entering edit mode

Thanks bk11, but I want to create a graph adjacency based on nn2 method in RANN package not using correlation or Euclidean distances.

ADD REPLY • link 8 months ago by Raheleh ▴ 260

1

Entering edit mode

You need to use nn.idx element and define k (The maximum number of nearest neighbours to compute). And perform something like this-

rm(list=ls())
library(RANN)
library(igraph)
library(igraph.extensions)
library(leiden)
library(plot.igraph)

# Set the number of genes and samples
num_genes <- 5000
num_samples <- 40

# Generate random gene expression data
set.seed(123)  # For reproducibility
gene_expression <- matrix(rnorm(num_genes * num_samples), ncol = num_samples)

# Add row and column names for clarity
rownames(gene_expression) <- paste("Gene", 1:num_genes, sep = "")
colnames(gene_expression) <- paste("S", 1:num_samples, sep = "")

head(gene_expression)

snn <- RANN::nn2(t(gene_expression), k=30)$nn.idx
adjacency_matrix <- matrix(0L, ncol(gene_expression), ncol(gene_expression))
rownames(adjacency_matrix) <- colnames(adjacency_matrix) <- colnames(gene_expression)
for(ii in 1:ncol(gene_expression)) {
  adjacency_matrix[ii,colnames(gene_expression)[snn[ii,]]] <- 1L
}

graph_object <- graph_from_adjacency_matrix(adjacency_matrix, mode = "directed")
#plot_directed(graph_object, cex.arrow = 0.3, col.arrow = "grey50")

partition <- leiden(adjacency_matrix)
table(partition)
library("RColorBrewer")
node.cols <- brewer.pal(max(partition),"Pastel1")[partition]
plot_directed(graph_object, cex.arrow = 0.3, col.arrow = "grey50", fill.node = node.cols)

This is for your reference- https://github.com/TomKellyGenetics/leiden

ADD REPLY • link 8 months ago by bk11 ★ 2.4k

0

Entering edit mode

Thanks so much bk11! Are there any specific criteria or guidelines that I should consider when selecting the k value for my gene expression data?

I also got this error when running this part of the script:

> partition <- leiden(adjacency_matrix)
Error: ImportError: DLL load failed while importing _igraph: The specified module could not be found.

Do you know how to resolve the issue?

I am working with R and this is my session info:

Sorry it doesn't let me to save the Session info here --- because of invalid characters!!

ADD REPLY • link 8 months ago by Raheleh ▴ 260

0

Entering edit mode

Are there any specific criteria or guidelines that I should consider when selecting the k value for my gene expression data?

The maximum number of nearest neighbours to compute. The default value is set to the smaller of the number of columnns in data

Error: ImportError: DLL load failed while importing _igraph: The specified module could not be found.

Did you install leiden? If not do this- install.packages("leiden") or devtools::install_github("TomKellyGenetics/leiden") and additionally you may have to do py_install("python-igraph") & py_install("leidenalg"). Please make sure that you have installed reticulate

ADD REPLY • link 8 months ago by bk11 ★ 2.4k