Question: why PCA for RNA-Seq but tSNE for scRNA-seq?
gravatar for CY
11 months ago by
United States
CY310 wrote:

I am under the impression that, in general, PCA is used for RNA-Seq but tSNE is used for scRNA-seq.

Can anyone share some comments on why this is the case? is it because some intrinsic difference between mRNA and scRNA?

rna-seq tsne pca scrna-seq • 4.2k views
ADD COMMENTlink modified 5 months ago by Minstein70 • written 11 months ago by CY310

Already some fine answers, but just a quick comment. Both of these techniques are fundamentally different in terms of their mathematics and their goals. Performing tSNE on bulk RNA-seq does not make much sense to me because, being 'bulk', it's just a heterogeneous mixture of cells that have been extracted from a biopsy, and we are not to know (from looking at the data) which expression signals come from which cells. You could use tSNE on bulk data, no problem, but it would require parameter re-configuration, as Devon implies. One could also easily perform PCA on both bulk and scRNA-seq.

tSNE is used on scRNA-seq because this type of seq gives us expression values on a cell-wise basis, so, tSNE is one of many methods that looks for relationships between these cells and attempts to assign groups of cells into cell populations that way. A similar thing is performed in CyTOF analysis.

For bulk RNA-seq, an equivalent to tSNE applied to scRNA-seq data would be the method known as cell deconvolution, i.e., looking at the bulk data and trying to determine the proportion of, for example, immune cell-types in the data.

I give an 'intro' to PCA here: A: PCA in a RNA seq analysis

ADD REPLYlink modified 5 months ago • written 11 months ago by Kevin Blighe41k

Can I say PCA focus on the sample distance (variance) and can show outliers. On the other hand, tSNE focus on local structure at expense of long distance information and may mis-cluster some samples together when actually they are far away?

ADD REPLYlink written 11 months ago by CY310

may mis-cluster some samples together when actually they are far away

I'd rather say that the main caveat is to visually make out patterns in the resulting plot and assign meaning to them without further corroboration. This is a great website to familiarize yourself empirically with tSNE results and the effects of both the parameters as well as the underlying data structures.

ADD REPLYlink written 11 months ago by Friederike3.7k
gravatar for Devon Ryan
11 months ago by
Devon Ryan89k
Freiburg, Germany
Devon Ryan89k wrote:

This has more to do with the goal of the techniques. PCA in bulk RNAseq is intended for QC to see if there are outliers. tSNE is used in scRNA-seq is used to find cell-types or other groups. You could use tSNE in place of PCA in bulk RNAseq, but since there are parameters to tweak and it's more computationally expensive there's no great benefit to do so.

ADD COMMENTlink written 11 months ago by Devon Ryan89k
gravatar for Friederike
11 months ago by
United States
Friederike3.7k wrote:

My impression is that one of the main reasons is the simple fact that tSNE hasn't been around as long as PCA (Reference) Plus, PCA tends to work well on bulk RNA-seq data and mathematically, its application for detecting outliers as Devon pointed out, makes sense since it's calculating the vectors along which the variation is maximal -- in bulk RNA-seq, you typically want the variation to stem from experimental factors rather than from individual samples, so PCA is a good check for that. In scRNA-seq you typically don't have many different samples, instead you have thousands of different cells, usually stemming from only one or a handful of different samples.

ADD COMMENTlink written 11 months ago by Friederike3.7k
gravatar for harold.smith.tarheel
11 months ago by
United States
harold.smith.tarheel4.3k wrote:

There's a decent non-mathematical description of the relative features of PCA and tSNE (as well as diffusion maps) in this review.

ADD COMMENTlink written 11 months ago by harold.smith.tarheel4.3k
gravatar for chris86
8 months ago by
United Kingdom, London
chris86250 wrote:

I think this is largely because how the clusters are distributed in sample space. For instance, in cancer research we usually have overlapping Gaussian clusters. Often quite a simple structure. Both PCA and tSNE work fine to show these structures in my experience. Sometimes, but rarely, the structures in some datasets may be more complex, towards single cell RNA-seq complexity, and tSNE works better in these situations.

In single cell RNA-seq oftentimes we have far more complex structures usually consisting of many globular clusters (cell types) of different sizes and variance arranged in complex patterns in sample space. tSNE can capture complex non-linear structures well, PCA can't.

Edit: you may want to look into the UMAP algorithm.

ADD COMMENTlink modified 12 weeks ago • written 8 months ago by chris86250
gravatar for Charles Warden
7 months ago by
Charles Warden6.6k
Duarte, CA
Charles Warden6.6k wrote:

I think the main advantage to tSNE is that it will space out the data points, but a disadvantage can be you may sometimes want to be careful about over-interpreting some swirly shapes within the tSNE plot (and PCA is often OK with the smaller number of samples for RNA-Seq project). However, Devon is also correct that tSNE is often used for separting cell types (and I would expect that to be increasingly true as larger numbers of cells/samples are considered).

It may also be helpful to review Difference between tSNE and PCA analysis, which I used to confirm that PCA is used as an upstream step for tSNE.

ADD COMMENTlink modified 7 months ago • written 7 months ago by Charles Warden6.6k
gravatar for Minstein
5 months ago by
Minstein70 wrote:

I think tSNE is capturing the local relationships between points, like treating your data as a network where your cells are nodes. PCA is calculating the "true" distances between points after we consider variation in your dataset. Mahalanobis distance can do similar things.

Applying PCA before tSNE is in fact projecting your data into a low-dimensional subspace, where the distances between points are more real and therefore you could obtain more real local relationships between points.

ADD COMMENTlink written 5 months ago by Minstein70
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1361 users visited in the last hour