Question

Clustering with other data type than scRNA-seq using Seurat

0

Entering edit mode

1 day ago

npont ▴ 20

Hi all,

I have generated a count matrix from a gDNA-seq experiment. The values in the matrix actually correspond to methylation levels in given gene in given cell. These levels were obtained after applying some log likelihood models to take into consideration the cell background methylation (just to give you a context, but that's not truly what matters here).

I would like to perform a clustering similar to what is done in scRNA-seq in order to annotate cell types but therefore based on methylation (there is a reasoning behind, it is not a random attempt).

Has anyone ever performed these types of analysis (not necessarily on methylation values, but on any data that aren't RNAseq) ?

I tried to first run a PCA (after the normalization, finding most variable features and scaling steps - because RunPCA of Seurat would not work without the layers generated by those steps) however it throws the following error:

Error: None of the requested features have any variance

Any help or insights would be much appreciated!

Thank you :)

countmatrix scrna-seq methylation seurat • 310 views

ADD COMMENT • link updated 13 hours ago by Mensur Dlakic ★ 30k • written 1 day ago by npont ▴ 20

1

Entering edit mode

I would skip Seurat entirely here (and generally), and rather use more transparent methods, such as the ones in the Bioconductor framework. Can you share an example of the data?

ADD REPLY • link 21 hours ago by ATpoint 89k

0

Entering edit mode

Thank you ATpoint and jared.andrews07 for your help :)

My data is really just a sparse matrix with genes as rows and cells as columns. The values are mostly between -3 and +3. When a gene in a given cell has a positive value, it means that the methylation fraction in gene is superior to the cell methylation fraction; and when it's negative it's the opposite.

In my experiment the gene methylation correlates with the gene expression, and therefore I want to check if I am able to annotate the cell types based on those methylation data. This is why I want to perform clustering.

Would Bioconductor framework work in that context? I have never used it so I don't know what I should use to be honest.

Thank you again for the time and advices; really appreciated

ADD REPLY • link 19 hours ago by npont ▴ 20

1

Entering edit mode

Sure, Bioc is generic. See for example as a guide here https://bioconductor.org/books/3.21/OSCA.basic/clustering.html#implementation

In the specific code examples they start from a SingleCellExperiment, but the help pages of the functions will show you that all the functions can also take generic inputs such as (sparse) matrix. It comes down to running feature selection, dimensionality reduction and then some sort of clsutering on your data.

ADD REPLY • link 16 hours ago by ATpoint 89k

score 2 · Answer 1 · 2025-10-23

The standard Seurat workflow may or may not be appropriate for your data. Probably not, given you said you've already altered it to account for background and such. I don't know that the normalization Seurat performs is then appropriate. The error indicates something in that process is coercing all your values to the same value, hence no variance. Hard to tell without code and a better idea of what your data looks like though.

You have a matrix, there's nothing stopping you from running PCA (and other dimensionality reduction methods, e.g. tSNE and UMAP) and various clustering methods without using Seurat. As a bonus, you'd get to avoid their annoying data structure.

score 0 · Answer 2 · 2025-10-24

As you were told, any dimensionality reduction method could (and probably should) be used here outside of the Seurat package.

In my experience, PCA does not work with sparse data (because it can't be centered), but truncated SVD will. You might get t-SNE to work with sparse data but I think in such a case the exact method must be used (the angle parameter, sometimes called theta, must be set to zero). If so, this will be extremely slow for large matrices. You could use truncated SVD to reduce the sparse dataset to 30-50 dense components, then feed that into regular t-SNE.

UMAP is your best bet to work natively with sparse data without any additional manipulations.

My suggestion to you is to do all the methods listed above, and possibly others. Then you decide which visualization works best for you.