Hi all,
I have generated a count matrix from a gDNA-seq experiment. The values in the matrix actually correspond to methylation levels in given gene in given cell. These levels were obtained after applying some log likelihood models to take into consideration the cell background methylation (just to give you a context, but that's not truly what matters here).
I would like to perform a clustering similar to what is done in scRNA-seq in order to annotate cell types but therefore based on methylation (there is a reasoning behind, it is not a random attempt).
Has anyone ever performed these types of analysis (not necessarily on methylation values, but on any data that aren't RNAseq) ?
I tried to first run a PCA (after the normalization, finding most variable features and scaling steps - because RunPCA of Seurat would not work without the layers generated by those steps) however it throws the following error:
Error: None of the requested features have any variance
Any help or insights would be much appreciated!
Thank you :)
I would skip Seurat entirely here (and generally), and rather use more transparent methods, such as the ones in the Bioconductor framework. Can you share an example of the data?
Thank you ATpoint and jared.andrews07 for your help :)
My data is really just a sparse matrix with genes as rows and cells as columns. The values are mostly between -3 and +3. When a gene in a given cell has a positive value, it means that the methylation fraction in gene is superior to the cell methylation fraction; and when it's negative it's the opposite.
In my experiment the gene methylation correlates with the gene expression, and therefore I want to check if I am able to annotate the cell types based on those methylation data. This is why I want to perform clustering.
Would Bioconductor framework work in that context? I have never used it so I don't know what I should use to be honest.
Thank you again for the time and advices; really appreciated
Sure, Bioc is generic. See for example as a guide here https://bioconductor.org/books/3.21/OSCA.basic/clustering.html#implementation
In the specific code examples they start from a SingleCellExperiment, but the help pages of the functions will show you that all the functions can also take generic inputs such as (sparse) matrix. It comes down to running feature selection, dimensionality reduction and then some sort of clsutering on your data.