Question: Question about scRNA-seq
0
gravatar for tujuchuanli
5 months ago by
tujuchuanli60
tujuchuanli60 wrote:

Hi,

I am analyzing the single cell RNA-seq data generated by our own lab. The whole datasets comprised four time points. My analysis mainly based on Seurat tutorial (https://satijalab.org/seurat/v3.1/pbmc3k_tutorial.html). Everything goes fine except for the JackStrawPlot and UMAP. My own jackStrawPlot showed that every component, even PC 20, is highly significant and the many of my clusters in my UMAP plot are not separated well. Below are my JackStrawPlot and UMAP. Is it serious? or normal just different dataset leaded to different plot? Do you have some suggestions?

Thanks in advance.

JackStrawPlot

UMAP

rna-seq • 376 views
ADD COMMENTlink modified 5 months ago by jared.andrews075.5k • written 5 months ago by tujuchuanli60
1
gravatar for jared.andrews07
5 months ago by
jared.andrews075.5k
St. Louis, MO
jared.andrews075.5k wrote:

Would be more useful to see the UMAP colored by your timepoints to see if there are any differences between/within clusters. Additionally, it's not clear how many PCs you're feeding to FindNeighbors. 10 is the default, and it seems obvious you should probably include more than that. If you're only using 10, it's likely affecting your clustering given the amount of heterogeneity you're ignoring in the other PCs. The same is true for your UMAP call.

I've started defaulting to using the SCtransform method, which is more robust and removes technical noise, so it benefits from additional PCs. So I use more PCs (like 30-50).

ADD COMMENTlink written 5 months ago by jared.andrews075.5k

Thanks,

It is very helpful, I will try it. There exists differences between/within clusters. In some clusters, cells are mainly from one timepoints and some clusters comprise almost equal percentage of cells from four timepoints. I just wonder why I didn`t get significantly separated clusters as the manual as well as papers. Below is related codes:

pbmc <- FindNeighbors(pbmc, dims = 1:15)
pbmc <- FindClusters(pbmc, resolution = 0.5)
pbmc <- RunUMAP(pbmc, dims = 1:15)
DimPlot(pbmc, reduction = "umap", label=T)

Before I run these codes, I filter out the cells with low sequencing depth, low percentage of detected genes and high percentage of reads mapped to MT genes.

Additionally:

Since I plot to determine the ‘dimensionality’ of the dataset by using “JackStrawPlot” which is prior to the usage of function “FindNeighbors” according to the manual, The setting of “FindNeighbors” doesn’t affect the results of “JackStrawPlot”. I find that all of the PCs are highly significant in my “JackStrawPlot”. How is this possible?

Many thanks.

ADD REPLYlink modified 5 months ago • written 5 months ago by tujuchuanli60

How is this possible?

Because your cells are either very heterogeneous, have technical variation, or are being affected by things like cell cycle. In itself, having variation in those higher PCs isn't necessarily a bad thing.

I just wonder why I didn`t get significantly separated clusters as the manual as well as papers.

Well, you aren't using as many PCs as you probably should be if you have many that contain significant variation that may help differentiate those cells/clusters better. I'd recommend trying the method in the vignette I linked (or at least increasing the dims used in your code). Additionally, the resolution you're using is pretty low. I'd play with some higher values (0.8, 1.0 maybe) as well to see if they yield better discrimination between your timepoints.

Clustering is a balancing in terms of resolution, as you want as fine-grained as possible without forcing superficial differences - this is where your biological expertise comes in. Only you can determine if the cluster markers make biological sense or if its just the software grasping at straws.

ADD REPLYlink written 5 months ago by jared.andrews075.5k

Thanks,

I will try it. Below is my new try of your recommand method.

ADD REPLYlink modified 5 months ago • written 5 months ago by tujuchuanli60

Hi,

I have tried the new methods, "SCtransform" and find that it is much better at least in JackStrawPlot. Below is my new plot.

Just one more question. why are the pc not sorted by P-value? For example, the P-value of PC14 is much more significant than PC10-13.

New JackStrawPlot

ADD REPLYlink modified 5 months ago • written 5 months ago by tujuchuanli60

They're presumably sorted by the amount of variance they account for. I usually just use elbowplots for this purpose, so I don't know exactly what the p-values are derived from here.

ADD REPLYlink written 5 months ago by jared.andrews075.5k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2200 users visited in the last hour