Question: In PCA on RNA-Seq data, what does Gaussian distribution refer to?
gravatar for CY
5 months ago by
United States
CY480 wrote:

I am aware that PCA done on Gaussian distributed data (such as RNA-Seq) ensures the uncorrelatedness as well as independence of each factor. I am having difficulty understanding what the 'Gaussian distribution refers to' here.

For example in gene expression data. I have a matrix with each row indicating specific gene and each column indicating each individual. If I perform an PCA on this matrix hoping to uncover several expression patterns, what does 'Gaussian distibution' refer to here? Is it the expression of each gene in specific pattern or the expression of single gene among individual?

rna-seq gaussian pca • 194 views
ADD COMMENTlink modified 5 months ago by Devon Ryan95k • written 5 months ago by CY480
gravatar for Devon Ryan
5 months ago by
Devon Ryan95k
Freiburg, Germany
Devon Ryan95k wrote:

The data does not need to be Gaussian and RNA-seq data is not Gaussian unless you want to transform it. PCA has more utility (in that the results look nicer) when you can transform the data such that there's Gaussian variance across genes, but that isn't a requirement.

ADD COMMENTlink written 5 months ago by Devon Ryan95k

The reason I am asking is to compare the rational of PCA and ICA on separating expression pattern from RNA-Seq data. ICA extracts independent factors while PCA extract linearly uncorrelated factors. If transformed RNA-Seq data can approximate Gaussian distribution (only with greater variance (negative binomial)), PCA done on such approximated Gaussian distributed data uncovers uncorrelated as well as independent factors.

ADD REPLYlink written 5 months ago by CY480

PCA necessarily finds independent factors, that's literally how it works. Those may or may not be related to anything biologically relevant, of course. ICA is a bit harder to compute and generally of lower utility except when you KNOW that the resulting components have a biological interpretation. As an example, I've used ICA to look at samples that were mixtures of multiple cell types. I knew how many cell types there were, so this was useful. Normally one uses PCA for generic QC, for which ICA isn't benefiting you in any way.

ADD REPLYlink written 5 months ago by Devon Ryan95k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1318 users visited in the last hour