Question: Confusion about FindMarkers(), FindVariableFeatures(), RunTSNE(), and RunUMAP() in seurat package
gravatar for F. Golestan
13 months ago by
F. Golestan70
F. Golestan70 wrote:


I am using Seurat package for the analysis of my single-cell RNA-seq. I read their basic PBMCs tutorial However, I get confused about some steps. I would highly appreciate having your answers on my following questions:

1) Is FindVariableFeatures() working on Normalized-Data or on Scaled_Data?

As in the seurat tutorial it is after NormalizeData() and before ScaleData(), but in kallisto | bustools tutorial (file:// it is after both NormalizeData() and ScaleData().

2) Is FindMarkers() working on Normalized_Data or Scaled_Data or clusters obtained by RunTSNE()/RunUMAP()?

As FindMarkers() finds marker genes of the clusters, so I assume that the input data for FindMarkers() should be clusters obtained by RunTSNE()/RunUMAP() which these functions are also using scaled_data or selected_PCAs (obtained from RunPCA function), see below. Therefore, in practice, FindMarkers() is also using Scaled_Data, while it should work on Normalized_Data.

RunPCA() on Scaled_Data --> Selection of PCAs --> RunTSNE()/RunUMAP() --> FindMarkers()

3) If FindMarkers() works on Normalized_Data (rather than Scaled_Data), how it can be explained that FindMarkers() has inputs of clusters obtained by RunTSNE()/RunUMAP() which these functions are also using scaled_data (not Normalized_Data)?

Many thanks for any help and clarifications.

ADD COMMENTlink modified 13 months ago by jared.andrews077.9k • written 13 months ago by F. Golestan70
gravatar for jared.andrews07
13 months ago by
Memphis, TN
jared.andrews077.9k wrote:

You appear to have a few misconceptions about what Seurat is doing for each command. I will do my best to address them.

  1. Neither, it's just using the counts. This step is just identifying the most variables genes to help limit the non-sparse matrix returned by ScaleData, as it is very memory intensive and will bloat the object greatly if all genes are used. In addition, the top few thousand most variable genes have been shown to be plenty sufficient for marker identification.

  2. RunTSNE and RunUMAP do not perform clustering, that is performed by FindClusters. FindMarkers uses the data slot by default (normalized counts) - it would make no sense to use the slot, as those are basically just z-scores (or residuals if using SCTransform or one of the integration methods). Depending on the test used, it may make more sense to use the counts slot at times.

  3. Again, these methods are not performing clustering. Clustering is performed by FindClusters after constructing a shared nearest neighbor graph on the output of RunPCA via FindNeighbors, which uses the PCA embeddings to determine similarities between cells. The clustering function then groups cells based on these similarities into clusters with an adjustable resolution that defines how granular the distinctions between clusters should be.

I highly recommend reading the manual for the commands in question to better understand what they're doing.

ADD COMMENTlink written 13 months ago by jared.andrews077.9k

Many thanks for your helpful explanations. I got the points. Thanks a lot.

ADD REPLYlink written 12 months ago by F. Golestan70

enter image description here

ADD REPLYlink written 10 months ago by ATpoint42k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2001 users visited in the last hour