Question: In PCA on RNA-Seq data, what does Gaussian distribution refer to?
0
gravatar for CY
4 weeks ago by
CY460
United States
CY460 wrote:

I am aware that PCA done on Gaussian distributed data (such as RNA-Seq) ensures the uncorrelatedness as well as independence of each factor. I am having difficulty understanding what the 'Gaussian distribution refers to' here.

For example in gene expression data. I have a matrix with each row indicating specific gene and each column indicating each individual. If I perform an PCA on this matrix hoping to uncover several expression patterns, what does 'Gaussian distibution' refer to here? Is it the expression of each gene in specific pattern or the expression of single gene among individual?

rna-seq gaussian pca • 110 views
ADD COMMENTlink modified 4 weeks ago by Devon Ryan94k • written 4 weeks ago by CY460
0
gravatar for Devon Ryan
4 weeks ago by
Devon Ryan94k
Freiburg, Germany
Devon Ryan94k wrote:

The data does not need to be Gaussian and RNA-seq data is not Gaussian unless you want to transform it. PCA has more utility (in that the results look nicer) when you can transform the data such that there's Gaussian variance across genes, but that isn't a requirement.

ADD COMMENTlink written 4 weeks ago by Devon Ryan94k

The reason I am asking is to compare the rational of PCA and ICA on separating expression pattern from RNA-Seq data. ICA extracts independent factors while PCA extract linearly uncorrelated factors. If transformed RNA-Seq data can approximate Gaussian distribution (only with greater variance (negative binomial)), PCA done on such approximated Gaussian distributed data uncovers uncorrelated as well as independent factors.

ADD REPLYlink written 4 weeks ago by CY460

PCA necessarily finds independent factors, that's literally how it works. Those may or may not be related to anything biologically relevant, of course. ICA is a bit harder to compute and generally of lower utility except when you KNOW that the resulting components have a biological interpretation. As an example, I've used ICA to look at samples that were mixtures of multiple cell types. I knew how many cell types there were, so this was useful. Normally one uses PCA for generic QC, for which ICA isn't benefiting you in any way.

ADD REPLYlink written 4 weeks ago by Devon Ryan94k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 801 users visited in the last hour