I have a cosmx dataset with 1k panel RNA, and also 3 useful immunofluorescence protein expression values. These values are in the meta data for each cell in the seurat object.
Is it possible to do dimensionality reduction, not only on the RNA, but also including the protein? For example, could you include each protein as an extra 'gene' column in the expression matrix and scale? (Though feels very wrong to my intuition, just a thought experiment). Sorry if this is a naive question.
I ask because I believe that the protein stains, pan-CK etc. would be very informative for dimensionality reduction, would be a shame not to make the best use.
Its probably not a good idea to do this directly. There are several things you could try, including various standardisation/normalisation transformations on the data before dimensionality reduction, but I'd probably initally try something designed for multi-omics data intergration, such as MultiOmic Factor Analysis (MOFA).
I ask because I believe that the protein stains, pan-CK etc. would be very informative for dimensionality reduction, would be a shame not to make the best use.
Dimensionality methods do not know, nor do they care, about the exact nature of data columns. I will use PCA as an example. We can provide any set of numerical columns that are all normalized with zero mean and unit variance and PCA will reduce them to principal components. If your goal is only to show the separation of samples and you disclose what went into plots, I don't see a problem in mixing data types. I also think that expression data might be useful for your purpose by contributing more to variance than RNA data.
Dimensionality reduction is reducing your feature space. This isn't necessary when K=3 (your fluorescence values), but it is pretty necessary when K>10,000 (your RNA).
Instead of trying to incorporate your fluorescence values into the RNA matrix prior to normalization + PCA, why not stack your PCs together with the fluorescence values. That way you're "reduced representation" looks like [PC1, ..., PC15, IF1, IF2, IF3], so 18 total dimensions (or 23 if n_pcs=20, etc.). Now any distances/embeddings built on this space will be aware both of dim-reduced RNA and the full variance of your IF data.
That's an interesting suggestion @LChart , thank you