Clustering with other data type than scRNA-seq using Seurat
2
0
Entering edit mode
1 day ago
npont ▴ 20

Hi all,

I have generated a count matrix from a gDNA-seq experiment. The values in the matrix actually correspond to methylation levels in given gene in given cell. These levels were obtained after applying some log likelihood models to take into consideration the cell background methylation (just to give you a context, but that's not truly what matters here).

I would like to perform a clustering similar to what is done in scRNA-seq in order to annotate cell types but therefore based on methylation (there is a reasoning behind, it is not a random attempt).

Has anyone ever performed these types of analysis (not necessarily on methylation values, but on any data that aren't RNAseq) ?

I tried to first run a PCA (after the normalization, finding most variable features and scaling steps - because RunPCA of Seurat would not work without the layers generated by those steps) however it throws the following error:

Error: None of the requested features have any variance

Any help or insights would be much appreciated!

Thank you :)

countmatrix scrna-seq methylation seurat • 310 views
ADD COMMENT
1
Entering edit mode

I would skip Seurat entirely here (and generally), and rather use more transparent methods, such as the ones in the Bioconductor framework. Can you share an example of the data?

ADD REPLY
0
Entering edit mode

Thank you ATpoint and jared.andrews07 for your help :)

My data is really just a sparse matrix with genes as rows and cells as columns. The values are mostly between -3 and +3. When a gene in a given cell has a positive value, it means that the methylation fraction in gene is superior to the cell methylation fraction; and when it's negative it's the opposite.

In my experiment the gene methylation correlates with the gene expression, and therefore I want to check if I am able to annotate the cell types based on those methylation data. This is why I want to perform clustering.

Would Bioconductor framework work in that context? I have never used it so I don't know what I should use to be honest.

Thank you again for the time and advices; really appreciated

ADD REPLY
1
Entering edit mode

Sure, Bioc is generic. See for example as a guide here https://bioconductor.org/books/3.21/OSCA.basic/clustering.html#implementation

In the specific code examples they start from a SingleCellExperiment, but the help pages of the functions will show you that all the functions can also take generic inputs such as (sparse) matrix. It comes down to running feature selection, dimensionality reduction and then some sort of clsutering on your data.

ADD REPLY
2
Entering edit mode
1 day ago

The standard Seurat workflow may or may not be appropriate for your data. Probably not, given you said you've already altered it to account for background and such. I don't know that the normalization Seurat performs is then appropriate. The error indicates something in that process is coercing all your values to the same value, hence no variance. Hard to tell without code and a better idea of what your data looks like though.

You have a matrix, there's nothing stopping you from running PCA (and other dimensionality reduction methods, e.g. tSNE and UMAP) and various clustering methods without using Seurat. As a bonus, you'd get to avoid their annoying data structure.

ADD COMMENT
0
Entering edit mode
13 hours ago
Mensur Dlakic ★ 30k

As you were told, any dimensionality reduction method could (and probably should) be used here outside of the Seurat package.

In my experience, PCA does not work with sparse data (because it can't be centered), but truncated SVD will. You might get t-SNE to work with sparse data but I think in such a case the exact method must be used (the angle parameter, sometimes called theta, must be set to zero). If so, this will be extremely slow for large matrices. You could use truncated SVD to reduce the sparse dataset to 30-50 dense components, then feed that into regular t-SNE.

UMAP is your best bet to work natively with sparse data without any additional manipulations.

My suggestion to you is to do all the methods listed above, and possibly others. Then you decide which visualization works best for you.

ADD COMMENT

Login before adding your answer.

Traffic: 3113 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6