How to choose method to integrate different patient sample together in Seurat
1
3
Entering edit mode
21 months ago
623202215 ▴ 60

Hi,

I have 3 patients with normal and tumor tissue sample(10× technology), there are six samples in total. I want combine them and find whether there is some difference between normal and tumor sample in specific cell type (such as immune cell). Due to different patients, it may have batch effect. I am struggle to choose integrate method.

My question is that:

1. For the Seurat integrate method, SCtransform method, harmony method, which one is more suitable for my case, or I just merge six sample together directly?

2. If use Seurat integrate pipeline and SCtransform pipeline, they recommand to use "RNA" assay instead of "integrate" assay to do Findmarker, for my understanding, "RNA" assay is batch-uncorrect matrix, will it cause some problem? Or there are other better options to do it?

Thanks a lot, hope to get your suggestions!

Best, Wei

scRNA Seurat batch effect harmony integrate • 3.7k views
4
Entering edit mode
21 months ago
1. The "best" method for you data is subjective. It is fairly easy to try several of the well-touted ones and see which look best for your data. This paper is a great review of the available methods and how they compare.

Do you care about patient-specific differences? If so, integrating may be against your best interest. Running without integration and seeing how samples cluster is likely a good starting point.

1. Generally, using the RNA assay is the way to go. It will use the original counts, yes, but the clusters are still defined using your integrated data. The Seurat devs are also working on making the residuals in the SCT assay a viable option for differential expression analysis, but I don't think that's recommended yet (though it should yield similar results).
0
Entering edit mode

Hi jared,

Thanks very much, it's very helpful. I will try it and see what's the "best" solution. In addition, I met several technicial problems when I perform seurat integrate pipeline, I would be appreciated if you can make some commnet.

1. For the seurat's integrate pipeline, they normalize each sample individually -> findanchor -> integrate -> scaledata, and the scaledata was saved in integrate assay, there is no scaledata in RNA assay. If I swich into RNA assay to perform FindAllMarker and want to plot them in a heatmap, how can I acheive it? Should I scaledata again in RNA assay or use integrate assay's scaledata? Or should I scaledata before integrate?

2. Another question is about regression, I notice that someone will choose regress unwanted signal such as mito.percent, UMI during scaledata. I really want to know how does seurat do when performing regression, would you like to discribe it more specifically. I am not very clearly about what's the change before and after regression. Sorry for this naive question, I am trying to understand more about seurat.

Thanks again, hope to get your suggestion!

Best, Wei

0
Entering edit mode
1. I have also ran into this. I generally scale data only for the RNA assay after integration so that the heatmap works properly. I've tried scaling before integration (but after merging all samples into a single Seurat object), but I think it gets removed for some reason during integration, if I remember correctly.

2. The regression is essentially removing differences between cells that are due to differences in a given variable(s). I recommend reading the Seurat papers or asking on their github if you want a better explanation. The papers go into much more detail, and a few different questions have been asked on their github issues page that you can find with a bit of searching.

0
Entering edit mode

Hi jared,

Thanks for your tiemly help, I will explore the related parper and github. Thanks again!

Best, Wei

0
Entering edit mode

Hi jared,

Thanks for your reply. I get a one more question, if you don't mind, I hope you can give me some suggestions. Sorry to bother you again!

1. When we use seurat integrate pipeline, how can we do sub-clustering on a specific cluster, for example, I want to study the heterogeneity of T cell, I want to plot this cluster individually in a new UMAP plot and find more sub-cluster. I notice there is a heat discussion in seurat github, but it seems that there is no clear idea. My current pipeline is to subet this cluster, split it by sample, and perform integrate pipeline again. I really appreciated it if you can help me optimize the pipeline or give me some advise.

Best, Wei

0
Entering edit mode

Yeah, the way forward for doing that isn't clear. In my eyes, there are two options. The first is your current approach. The second is to as fine-tuned clustering as you think you'll need with all your samples (increasing resolution to 1.5 or more), taking your subset, and just rolling with that. My guess is that your current approach will better serve your purpose.

Unfortunately, there's no clear answer, so you might have to experiment a bit to determine what yields the best results for your data.

0
Entering edit mode

Hi jared,

Thanks very much, I learn a lot from your answer. I agree with you, subset and run integrate pipeline again maybe better.

I also test the dimension of FindIntegrationAnchors() and IntegrateData(), the one is use from the CCA to specify the neighbor search space, the another one is number of PCs to use in the weighting procedure. When I set different number, the result can differ a lot. I know there is no clear answer but depends on our purpose. I wonder if you are familiar with these two parameter, from computatinal correct, whether these two dimension should be same, I mean both of them are 25, 30, 35? My intuition is that these two parameter is not very relevant. I am appreciate if you can discuss these problem.

Thanks

Best, Wei

0
Entering edit mode

I've not used CCA (and I've seen the Seurat folks recommend the new integration method over it), so I'm not sure on that one.

0
Entering edit mode

I will explore it, thanks a lot for patient and timely help!