Hi all,

I am a little confused about the definition of liner separability in the context of principle component analysis on gene expression data. I would really appreciate clarification and hope that someone can provide it. I apologise if this is the wrong forum in which to do so.

So i am fully aware about the definition of linear and non-linear relationships, what they mean in the context of the correlations they capture, but I think the 'real world' example of what is actually happening is confusing me.

I am aware that PCA can only capture linear separability in RNA-seq. My confusion is as follows:

I have a PCA of RNA-seq data that reduces dimensionality for multiple organs and also across ages. While I am aware that I am looking at transcriptional variation here, can someone explain where the linear separation comes in? is this between samples as in organs and or ages?

How does the linear separability manifest in the context of gene expression between one organ and another? is this when the change in expression between genes in different organs is not 'dependent' on the change of others?

Thank you in advance!

It's purely mathematics. Do not try to understand it through biological meanings.

PCA and linear separability are two different things. PCA is a linear transformation algorithm, which can be used for dimension reduction. Linear separability is just a property of two sets of vectors (or you can think as points), which means these two set of vectors could be separate by a hyperplane in their high-dimension spaces.