Question about scRNA-seq
1
0
Entering edit mode
4.3 years ago
tujuchuanli ▴ 100

Hi,

I am analyzing the single cell RNA-seq data generated by our own lab. The whole datasets comprised four time points. My analysis mainly based on Seurat tutorial (https://satijalab.org/seurat/v3.1/pbmc3k_tutorial.html). Everything goes fine except for the JackStrawPlot and UMAP. My own jackStrawPlot showed that every component, even PC 20, is highly significant and the many of my clusters in my UMAP plot are not separated well. Below are my JackStrawPlot and UMAP. Is it serious? or normal just different dataset leaded to different plot? Do you have some suggestions?

Thanks in advance.

JackStrawPlot

UMAP

RNA-Seq • 2.5k views
ADD COMMENT
2
Entering edit mode
4.3 years ago

Would be more useful to see the UMAP colored by your timepoints to see if there are any differences between/within clusters. Additionally, it's not clear how many PCs you're feeding to FindNeighbors. 10 is the default, and it seems obvious you should probably include more than that. If you're only using 10, it's likely affecting your clustering given the amount of heterogeneity you're ignoring in the other PCs. The same is true for your UMAP call.

I've started defaulting to using the SCtransform method, which is more robust and removes technical noise, so it benefits from additional PCs. So I use more PCs (like 30-50).

ADD COMMENT
0
Entering edit mode

Thanks,

It is very helpful, I will try it. There exists differences between/within clusters. In some clusters, cells are mainly from one timepoints and some clusters comprise almost equal percentage of cells from four timepoints. I just wonder why I didn`t get significantly separated clusters as the manual as well as papers. Below is related codes:

pbmc <- FindNeighbors(pbmc, dims = 1:15)
pbmc <- FindClusters(pbmc, resolution = 0.5)
pbmc <- RunUMAP(pbmc, dims = 1:15)
DimPlot(pbmc, reduction = "umap", label=T)

Before I run these codes, I filter out the cells with low sequencing depth, low percentage of detected genes and high percentage of reads mapped to MT genes.

Additionally:

Since I plot to determine the ‘dimensionality’ of the dataset by using “JackStrawPlot” which is prior to the usage of function “FindNeighbors” according to the manual, The setting of “FindNeighbors” doesn’t affect the results of “JackStrawPlot”. I find that all of the PCs are highly significant in my “JackStrawPlot”. How is this possible?

Many thanks.

ADD REPLY
0
Entering edit mode

How is this possible?

Because your cells are either very heterogeneous, have technical variation, or are being affected by things like cell cycle. In itself, having variation in those higher PCs isn't necessarily a bad thing.

I just wonder why I didn`t get significantly separated clusters as the manual as well as papers.

Well, you aren't using as many PCs as you probably should be if you have many that contain significant variation that may help differentiate those cells/clusters better. I'd recommend trying the method in the vignette I linked (or at least increasing the dims used in your code). Additionally, the resolution you're using is pretty low. I'd play with some higher values (0.8, 1.0 maybe) as well to see if they yield better discrimination between your timepoints.

Clustering is a balancing in terms of resolution, as you want as fine-grained as possible without forcing superficial differences - this is where your biological expertise comes in. Only you can determine if the cluster markers make biological sense or if its just the software grasping at straws.

ADD REPLY
0
Entering edit mode

Thanks,

I will try it. Below is my new try of your recommand method.

ADD REPLY
0
Entering edit mode

Hi,

I have tried the new methods, "SCtransform" and find that it is much better at least in JackStrawPlot. Below is my new plot.

Just one more question. why are the pc not sorted by P-value? For example, the P-value of PC14 is much more significant than PC10-13.

New JackStrawPlot

ADD REPLY
0
Entering edit mode

They're presumably sorted by the amount of variance they account for. I usually just use elbowplots for this purpose, so I don't know exactly what the p-values are derived from here.

ADD REPLY

Login before adding your answer.

Traffic: 2695 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6