Hi all,
I have two sets of cells defined by a distinct modality as follows:
- one set is defined by RNA expression data from scRNA-seq data
- one set is defined by custom values reflecting methylation states
I would like to use MOSCOT "Translating multiomics single-cell data" ( tuto: https://moscot.readthedocs.io/en/latest/notebooks/tutorials/600_tutorial_translation.html ) to align those two modalities and downstream be able to perform clustering on those two groups of cells.
My problem is the following: those methylation states are for more than 80% negative values, and those negative values correspond to noise (methylation that did not work well). So I am mostly interested by the positive values; indeed genes having high values are very interesting for me and are expected to also have high RNA expression values. While the RNA expression is normalized by library size and log transformed, I struggle to find how to normalize the methylation values. I'm scared that if I apply a z-score it would hide the importance of my positive values as everything (including negative values) will be scaled and shifted. So I need to find a normalization for the methylation that take into account the importance of this minority of positive values, and make it still comparable with log normalized RNA expression values (because MOSCOT is based on optimal transport which is sensitive to scale and which implicitly assume comparability across the distributions of the two modalities).
PS: I can set the negative methylation values to zero if that help.
Any help on that topic would be super appreciated, Thank you very much !