Question: why PCA for RNA-Seq but tSNE for scRNA-seq?
gravatar for CY
22 months ago by
United States
CY470 wrote:

I am under the impression that, in general, PCA is used for RNA-Seq but tSNE is used for scRNA-seq.

Can anyone share some comments on why this is the case? is it because some intrinsic difference between mRNA and scRNA?

rna-seq tsne pca scrna-seq • 8.2k views
ADD COMMENTlink modified 16 months ago by Minstein140 • written 22 months ago by CY470

Already some fine answers, but just a quick comment. Both of these techniques are fundamentally different in terms of their mathematics and their goals. Performing tSNE on bulk RNA-seq does not make much sense to me because, being 'bulk', it's usually just a heterogeneous mixture of cells that have been extracted from a biopsy, and we are not to know (from looking at the data) which expression signals come from which cells.

tSNE is used on scRNA-seq because this type of seq gives us expression values on a cell-wise basis, so, tSNE is one of many methods that looks for relationships between these cells and attempts to assign groups of cells into cell populations that way. A similar thing is performed in CyTOF analysis.

For bulk RNA-seq, an equivalent to tSNE applied to scRNA-seq data would be the method known as cell deconvolution, i.e., looking at the bulk data and trying to determine the proportion of, for example, immune cell-types in the data.

It should be pointed out that scRNA-seq pipelines allow you to perform PCA on your data prior to performing tSNE.

ADD REPLYlink modified 11 months ago • written 22 months ago by Kevin Blighe56k

Can I say PCA focus on the sample distance (variance) and can show outliers. On the other hand, tSNE focus on local structure at expense of long distance information and may mis-cluster some samples together when actually they are far away?

ADD REPLYlink written 22 months ago by CY470

may mis-cluster some samples together when actually they are far away

I'd rather say that the main caveat is to visually make out patterns in the resulting plot and assign meaning to them without further corroboration. This is a great website to familiarize yourself empirically with tSNE results and the effects of both the parameters as well as the underlying data structures.

ADD REPLYlink written 22 months ago by Friederike5.3k
gravatar for Devon Ryan
22 months ago by
Devon Ryan94k
Freiburg, Germany
Devon Ryan94k wrote:

This has more to do with the goal of the techniques. PCA in bulk RNAseq is intended for QC to see if there are outliers. tSNE is used in scRNA-seq is used to find cell-types or other groups. You could use tSNE in place of PCA in bulk RNAseq, but since there are parameters to tweak and it's more computationally expensive there's no great benefit to do so.

ADD COMMENTlink written 22 months ago by Devon Ryan94k
gravatar for Friederike
22 months ago by
United States
Friederike5.3k wrote:

My impression is that one of the main reasons is the simple fact that tSNE hasn't been around as long as PCA (Reference) Plus, PCA tends to work well on bulk RNA-seq data and mathematically, its application for detecting outliers as Devon pointed out, makes sense since it's calculating the vectors along which the variation is maximal -- in bulk RNA-seq, you typically want the variation to stem from experimental factors rather than from individual samples, so PCA is a good check for that. In scRNA-seq you typically don't have many different samples, instead you have thousands of different cells, usually stemming from only one or a handful of different samples.

ADD COMMENTlink written 22 months ago by Friederike5.3k
gravatar for harold.smith.tarheel
22 months ago by
United States
harold.smith.tarheel4.5k wrote:

There's a decent non-mathematical description of the relative features of PCA and tSNE (as well as diffusion maps) in this review.

ADD COMMENTlink written 22 months ago by harold.smith.tarheel4.5k
gravatar for Minstein
16 months ago by
Minstein140 wrote:

I think tSNE is capturing the local relationships between points, like treating your data as a network where your cells are nodes. PCA is calculating the "true" distances between points after we consider variation in your dataset. Mahalanobis distance can do similar things.

Applying PCA before tSNE is in fact projecting your data into a low-dimensional subspace, where the distances between points are more real and therefore you could obtain more real local relationships between points.

ADD COMMENTlink written 16 months ago by Minstein140
gravatar for chris86
19 months ago by
United Kingdom, London
chris86330 wrote:

I think this is largely because how the clusters are distributed in sample space. For instance, in cancer research we usually have overlapping Gaussian clusters. Often quite a simple structure. Both PCA and tSNE work fine to show these structures in my experience. Sometimes, but rarely, the structures in some datasets may be more complex, towards single cell RNA-seq complexity, and tSNE works better in these situations.

In single cell RNA-seq oftentimes we have far more complex structures usually consisting of many globular clusters (cell types) of different sizes and variance arranged in complex patterns in sample space. tSNE can capture complex non-linear structures well, PCA can't.

Edit: you may want to look into the UMAP algorithm.

ADD COMMENTlink modified 14 months ago • written 19 months ago by chris86330
gravatar for Charles Warden
19 months ago by
Charles Warden7.6k
Duarte, CA
Charles Warden7.6k wrote:

I think the main advantage to tSNE is that it will space out the data points, but a disadvantage can be you may sometimes want to be careful about over-interpreting some swirly shapes within the tSNE plot (and PCA is often OK with the smaller number of samples for RNA-Seq project). However, Devon is also correct that tSNE is often used for separting cell types (and I would expect that to be increasingly true as larger numbers of cells/samples are considered).

It may also be helpful to review Difference between tSNE and PCA analysis, which I used to confirm that PCA is used as an upstream step for tSNE.

ADD COMMENTlink modified 19 months ago • written 19 months ago by Charles Warden7.6k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1107 users visited in the last hour