Question: scRNA-seq analysis of two data-sets
gravatar for piyushjo
16 months ago by
piyushjo550 wrote:


So I have two datasets from two different yet related cell lines: pre and post relapse cancer cell line from same patient.

I have performed single cell sequencing for both of them with the hypothesis that I will be able to find rarely expressed cells in either that are key to becoming relapse.

To do that I am following two protocols:

1) I am following the Seurat tutorial of integrating simulated and normal pbmc

2) I am following this second tutorial where there is no integration steps involved.

However both these approaches give two different results! Using 1) I am getting more cells from two different lines to be similar with very few cells from each to be different. Using 2) I am getting opposite, yet expected, result that most of the cells are different with only few common cells.

Which one is the correct way of analyzing? Are two approaches giving me two different results become they are doing different things? 1) is finding common genes and 2) finding distinct genes?

seurat scrna-seq • 1.8k views
ADD COMMENTlink modified 16 months ago by jared.andrews078.3k • written 16 months ago by piyushjo550
gravatar for jared.andrews07
16 months ago by
Memphis, TN
jared.andrews078.3k wrote:

These are very different approaches. The first tries to account for technical variation between the two sets, mapping similar cells between the two to each other. The second literally just merges the columns and rows of the two sets together into one object - it's not doing any special normalization. It's just a straight merging of data.

It's tough for us to say which is more appropriate - the first may help you to identify cell populations that truly differentiate the two, but it could be blowing away real differences due to how Seurat's integration works. It will force populations that aren't similar together if there aren't many overlapping cell types between the two samples. The second may be revealing significant technical variation or batch effects, or it could just be that your cell lines are quite different from each other. You are in the best position to determine if this is the case or not - we know nothing about your samples.

This is where the true difficulty of RNA-seq analysis lies - nobody is really going to be able to tell which is truly correct.

You might try other integration methods if you feel you have batch effects or technical variation that needs to be addressed. I've found the SeuratWrapper around fastMNN to be quite good, personally, as it handles cases where samples don't have much overlap in terms of cell types much more appropriately.

ADD COMMENTlink written 16 months ago by jared.andrews078.3k

Ok I understand your clarification, but what would you suggest doing when I really want to find something that connects the two sets that I know for sure are overall different. For example, if I have a differentiating neuronal culture from Day 1 and Day 2 and I am really interested in finding the transient population in Day1 that become dominant population in Day2, which "merging" method would be most beneficial.

I will also perform trajectory analysis after basic seurat work flow to find markers and visualization.

ADD REPLYlink written 16 months ago by piyushjo550

In such a case as that, if I wasn't worried about batch effects and my starting cells were the same, I'd just straight merge the data. I don't know much about neuronal differentiation, but I expect that there'd still be significant overlap. Being different from each other is fine - you'd just expect that they aren't completely different after such a short time.

If you can avoid integration, do so. Unless you know you have confounding batch or other technical effects, there is no reason to complicate your analysis, and it may even "correct" out real biological differences.

ADD REPLYlink written 16 months ago by jared.andrews078.3k

Thanks for your insight!

ADD REPLYlink written 16 months ago by piyushjo550
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2333 users visited in the last hour