How to construct an adjacency matrix based on output of nearest neighbors search method from RANN package?
1
0
Entering edit mode
8 months ago
Raheleh ▴ 260

Hello everyone,

I'm currently working with a dataset comprising 496 cancer samples and my aim is to perform class discovery to identify potential clusters within the data. My plan involves constructing a graph using the igraph and subsequently employing Louvain clustering to uncover communities within the dataset.

To start, I've applied a nearest neighbors search using the RANN package to create an adjacency matrix. The outcome of this step is a list with two elements: nn.idx, which provides the nearest neighbor indices, and nn.dists, offering the corresponding Euclidean distances. Both matrices have dimensions of 496 x 10, representing the near neighbor information for each sample.

I'm seeking guidance on how to proceed from here to generate a square adjacency matrix suitable for building a graph object. I truly appreciate any assistance or insights you can provide.

Here's the script I've used for the nearest neighbors search:

neighbors <- RANN::nn2(df.dat.2)

This is a glimpse of how my data looks:

df.dat.2[1:6,1:5]
                Cytotoxic_lymphocytes   NK_cells IMMUNE_CD8MACRO_GALON IMMUNE_TREG_PASTILLE IMMUNE_TH1_GALON
DFR18201125_S2              1.0847794 -0.7873059             2.0384016            2.3400656        1.6245264
CMR18160811_S14             1.4107888  0.7359309             2.3173862           -0.6108772        1.1848030
AKNR18200612_S1             1.4906686  1.4865404             1.8352200            0.4802375        1.5588536
X211_S39                   -0.2367320 -0.4840597            -1.4358730            0.5417923       -1.1066911
X213_S41                   -0.6760709 -0.5416064            -2.3613827            0.1748922       -1.6711970
X214_S42                   -1.0435179 -0.4920662            -0.8881929           -0.6243585       -0.8318914

Thank you for your insights and suggestions in advance.

igraph RANN adjacency-matrix • 994 views
ADD COMMENT
1
Entering edit mode
8 months ago
bk11 ★ 2.4k
library(igraph)

#Create a graph adjacency based on correlation distances between genes in  pairwise fashion.
g <- graph.adjacency(as.matrix(as.dist(cor(t(df.dat.2), method="pearson"))), mode="undirected", weighted=TRUE, diag=FALSE)

#If you want to use Euclidean distances instead of correlation,
g <- graph.adjacency(as.matrix(dist(MyData, method="euclidean")), mode="undirected", weighted=TRUE, diag=FALSE)

If you are working in single cell data, you could do something like below using pbmc3k data-

library(SingleCellExperiment)
library(Seurat)
library(scater)
library(scran)
library(igraph)
library(SeuratData)
library(pheatmap)
library(mclust)

#InstallData("pbmc3k")
pbmc <- LoadData("pbmc3k", type = "pbmc3k.final")

pbmc.sce <- as.SingleCellExperiment(pbmc.updated)
pbmc.sce

#create SNN graph
graph_k10 <- scran::buildSNNGraph(pbmc.sce, k = 10, use.dimred = "PCA", type = "rank")

#community detection in igraph using walktrap algorithm and Louvian method
clust_k10_walktrap <- igraph::cluster_walktrap(graph_k10)$membership
clust_k10_louvain <- igraph::cluster_louvain(graph_k10)$membership

table(clust_k10_walktrap)
table(clust_k10_louvain)

table(clust_k10_walktrap, clust_k10_louvain)

## Add cluster assignments to the SingleCellExperiment object and visualize in
## UMAP representation
pbmc.sce$cluster_walktrap_k10 <- factor(clust_k10_walktrap)
pbmc.sce$cluster_louvain_k10 <- factor(clust_k10_louvain)
scater::plotReducedDim(pbmc.sce, "UMAP", colour_by = "cluster_walktrap_k10")

scater::plotReducedDim(pbmc.sce, "UMAP", colour_by = "cluster_louvain_k10")

## Define a set of colors to use (must be at least as many as the number of
## communities)
cols <- RColorBrewer::brewer.pal(n = 12, name = "Paired")
## Plot the graph, color by cluster assignment
igraph::plot.igraph(
  graph_k10, layout = layout_with_fr(graph_k10),
  vertex.color = cols[clust_k10_walktrap],
  vertex.size = 5, vertex.label = NA, main = "Walktrap"
)

igraph::plot.igraph(
  graph_k10, layout = layout_with_fr(graph_k10),
  vertex.color = cols[clust_k10_louvain],
  vertex.size = 5, vertex.label = NA, main = "Louvain"
)

Image

ADD COMMENT
0
Entering edit mode

Thanks bk11, but I want to create a graph adjacency based on nn2 method in RANN package not using correlation or Euclidean distances.

ADD REPLY
1
Entering edit mode

You need to use nn.idx element and define k (The maximum number of nearest neighbours to compute). And perform something like this-

rm(list=ls())
library(RANN)
library(igraph)
library(igraph.extensions)
library(leiden)
library(plot.igraph)

# Set the number of genes and samples
num_genes <- 5000
num_samples <- 40

# Generate random gene expression data
set.seed(123)  # For reproducibility
gene_expression <- matrix(rnorm(num_genes * num_samples), ncol = num_samples)

# Add row and column names for clarity
rownames(gene_expression) <- paste("Gene", 1:num_genes, sep = "")
colnames(gene_expression) <- paste("S", 1:num_samples, sep = "")

head(gene_expression)

snn <- RANN::nn2(t(gene_expression), k=30)$nn.idx
adjacency_matrix <- matrix(0L, ncol(gene_expression), ncol(gene_expression))
rownames(adjacency_matrix) <- colnames(adjacency_matrix) <- colnames(gene_expression)
for(ii in 1:ncol(gene_expression)) {
  adjacency_matrix[ii,colnames(gene_expression)[snn[ii,]]] <- 1L
}

graph_object <- graph_from_adjacency_matrix(adjacency_matrix, mode = "directed")
#plot_directed(graph_object, cex.arrow = 0.3, col.arrow = "grey50")

partition <- leiden(adjacency_matrix)
table(partition)
library("RColorBrewer")
node.cols <- brewer.pal(max(partition),"Pastel1")[partition]
plot_directed(graph_object, cex.arrow = 0.3, col.arrow = "grey50", fill.node = node.cols)

image

This is for your reference- https://github.com/TomKellyGenetics/leiden

ADD REPLY
0
Entering edit mode

Thanks so much bk11! Are there any specific criteria or guidelines that I should consider when selecting the k value for my gene expression data?

I also got this error when running this part of the script:

> partition <- leiden(adjacency_matrix)
Error: ImportError: DLL load failed while importing _igraph: The specified module could not be found.

Do you know how to resolve the issue?

I am working with R and this is my session info:

Sorry it doesn't let me to save the Session info here --- because of invalid characters!!

ADD REPLY
0
Entering edit mode

Are there any specific criteria or guidelines that I should consider when selecting the k value for my gene expression data?

The maximum number of nearest neighbours to compute. The default value is set to the smaller of the number of columnns in data

Error: ImportError: DLL load failed while importing _igraph: The specified module could not be found.

Did you install leiden? If not do this- install.packages("leiden") or devtools::install_github("TomKellyGenetics/leiden") and additionally you may have to do py_install("python-igraph") & py_install("leidenalg"). Please make sure that you have installed reticulate

ADD REPLY

Login before adding your answer.

Traffic: 1836 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6