Question

Using Leiden algorithm to cluster bulk RNA samples

0

Entering edit mode

2.1 years ago

nhaus ▴ 310

Hello,

I am currently analyzing a publically available dataset with ~200 samples. I want to cluster the samples based on their expression to identify disease endotypes. I already tried out hclust and NMF. I know that the Leiden algorithm is often used in single cell analysis and performs quite well there, so my idea was to also try this out. Ultimately, I would simply pretend that my bulk RNAseq samples are "cells" so that I can use Seurat to perform the clustering steps.

However, I did not find any papers in the literature that used the Leiden algorithm to perform bulk RNA seq clustering. This makes me wonder, if I am overlooking something and that the Leiden algorithm or my approach (pretend samples are cells so I can use Seurat) is not suitable.

I would appreciate any insights or comments on this. Thanks!

clustering RNAseq Leiden • 1.1k views

ADD COMMENT • link 2.1 years ago by nhaus ▴ 310

score 4 · Accepted Answer · 2022-03-30

As far as I know it is usually simply not needed to apply Leiden clustering to most bulk RNA-seq experiments because the usual clustering methods seem to work ok. What is the reason that you feel hclust, for example, doesn't do it for your data?

That being said, I don't see obvious reasons why not to apply the graph-based clustering. I would recommend not to use Seurat, though, since your data isn't actual single-cell data and some of the default settings of Seurat are meant to work with the sparse, zero-rich matrices of single cells. It's probably better to use the graph building and community detection (igraph package) methods separately, you can read more about it here and here

This article by Tran et al. (2020) seems to make use of graph-based clustering for bulk RNA-seq data, too (although with a different goal, I believe); they may give some hints for practical considerations.