Hello,
I am using Seurat package for the analysis of my single-cell RNA-seq. I read their basic PBMCs tutorial https://satijalab.org/seurat/v3.1/pbmc3k_tutorial.html. However, I get confused about some steps. I would highly appreciate having your answers on my following questions:
1) Is FindVariableFeatures() working on Normalized-Data or on Scaled_Data?
As in the seurat tutorial it is after NormalizeData()
but before ScaleData()
, but in kallisto | bustools tutorial (file://filestore.soton.ac.uk/users/fsgh1d18/mydesktop/Kallisto_bustools/BUS_notebooks_R-master/docs/10xv2.html), it is after both NormalizeData()
and ScaleData()
.
2) Is FindMarkers()
working on Normalized_Data
or Scaled_Data
or clusters obtained by RunTSNE()
/RunUMAP()
?
As FindMarkers()
finds marker genes of the clusters, so I assume that the input data for FindMarkers()
should be clusters obtained by RunTSNE()
/RunUMAP()
which these functions are also using scaled_data
or selected_PCAs
(obtained from RunPCA function), see below. Therefore, in practice, FindMarkers()
is also using Scaled_Data
, while it should work on Normalized_Data
.
RunPCA()
on Scaled_Data
--> Selection of PCAs --> RunTSNE()
/RunUMAP()
--> FindMarkers()
3) If FindMarkers()
works on Normalized_Data
(rather than Scaled_Data
), how it can be explained that FindMarkers()
has inputs of clusters obtained by RunTSNE()
/RunUMAP()
which these functions are also using scaled_data
(not Normalized_Data
)?
Many thanks for any help and clarifications.
Many thanks for your helpful explanations. I got the points. Thanks a lot.
Hi Jared.
Just one more about the second point of your answer.
What I understand was
SCTransform
will only calculate residuals and we can access it byGetAssayData(obj, slot="scale.data")
. If I understand you correctly, the value ofGetAssayData(obj, slot="data")
is also calculated bySCTransform
and such value is done byNormalizeData()
in oldSeurat
. So isSCTransform
'sGetAssayData(obj, slot="data")
==NormalizeData(obj)
?Since
Seurat
is under development continuously and there is always an 'A-ha' monent.I do not know. I expect they are similar, but I don't know enough about the internals of the
SCTranscform
function to say that they should be identical. Easy enough to test yourself on a toy dataset though.Just double-checked.
RNA
counts
: Stores unnormalized data such as raw counts or TPMsdata
: Normalized data matrixscale
.data: Scaled data matrixSCT
counts
: corrected countsdata
: log1p(counts)scale.data
: pearson residualsSo
SCTransform
'sGetAssayData(obj, slot="data")
is not equal toRNA
'sNormalizeData(obj)
.But another big issue I can not make clear is when we should which assay or slot even I spent more than 3 hours on this. Especially for
FindMarkers
.I am not sure if this will help but this is my current understanding. I have recently started learning about the Single Cell analysis and here it is.
If you follow the general integration method of Seurat given here. For
Findmarkers
function useRNA
assay: reason1 reason2For SCTtransformed data, you can again use RNA assay or SCT normalized values as recommended in this thread
They contradict point 2 in numerous other issues. It has been nearly 2 years since they introduced that function and it's still not clear how/if SCT normalized values are truly appropriate for DE in all cases or not.
The answer to "can I use the SCT assay with typical DE function available in Seurat?" from the Satija lab seems to be something along the lines of "Maybe. Probably. Sometimes. Sometimes probably not."
On the other hand, the
sctransform
author (the package, not the SeuratSCTransform
function) has provided a way to do DE on sctransform'ed matrices (including Seurat SCT assays), so who knows at this point. See the vignette.