Dimensionality reduction using RNA and protein expression
3
0
Entering edit mode
4 weeks ago

Hi everyone,

I have a cosmx dataset with 1k panel RNA, and also 3 useful immunofluorescence protein expression values. These values are in the meta data for each cell in the seurat object.

Is it possible to do dimensionality reduction, not only on the RNA, but also including the protein? For example, could you include each protein as an extra 'gene' column in the expression matrix and scale? (Though feels very wrong to my intuition, just a thought experiment). Sorry if this is a naive question.

I ask because I believe that the protein stains, pan-CK etc. would be very informative for dimensionality reduction, would be a shame not to make the best use.

Thanks.

dimensionality-reduction RNA-seq spatial • 442 views
ADD COMMENT
2
Entering edit mode
4 weeks ago

Its probably not a good idea to do this directly. There are several things you could try, including various standardisation/normalisation transformations on the data before dimensionality reduction, but I'd probably initally try something designed for multi-omics data intergration, such as MultiOmic Factor Analysis (MOFA).

ADD COMMENT
1
Entering edit mode
4 weeks ago
Mensur Dlakic ★ 29k

I ask because I believe that the protein stains, pan-CK etc. would be very informative for dimensionality reduction, would be a shame not to make the best use.

Dimensionality methods do not know, nor do they care, about the exact nature of data columns. I will use PCA as an example. We can provide any set of numerical columns that are all normalized with zero mean and unit variance and PCA will reduce them to principal components. If your goal is only to show the separation of samples and you disclose what went into plots, I don't see a problem in mixing data types. I also think that expression data might be useful for your purpose by contributing more to variance than RNA data.

ADD COMMENT
1
Entering edit mode
4 weeks ago
LChart 4.9k

Dimensionality reduction is reducing your feature space. This isn't necessary when K=3 (your fluorescence values), but it is pretty necessary when K>10,000 (your RNA).

Instead of trying to incorporate your fluorescence values into the RNA matrix prior to normalization + PCA, why not stack your PCs together with the fluorescence values. That way you're "reduced representation" looks like [PC1, ..., PC15, IF1, IF2, IF3], so 18 total dimensions (or 23 if n_pcs=20, etc.). Now any distances/embeddings built on this space will be aware both of dim-reduced RNA and the full variance of your IF data.

ADD COMMENT
0
Entering edit mode

That's an interesting suggestion @LChart , thank you

ADD REPLY

Login before adding your answer.

Traffic: 2365 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6