Question

Merge or integrate multiple samples and then do downstream analysis

0

Entering edit mode

6 months ago

diqixiaoyaoer ▴ 10

I am doing some analysis on a public scRNAseq datasets in order to see differential gene expression between two clusters.

The basal sample information about it:

tissue_donor_1_treatment
tissue_donor_2_treatment
tissue_donor_1_control
tissue_donor_1_control

All of them produced under the same sequencing conditions.

In my opinion, I want to divide them into two groups: treatment and control.

According to the seurat v3 tutorial https://satijalab.org/seurat/archive/v3.1/immune_alignment.html, I did similar analysis and get a result.

My question is if I need to to do integration to remove batch effect based on my original purpose (to see differential gene expression between treatment and control)?

I did it and get some results.

But I also just merged them together simply and the skipped integration step and then do the same analysis but the clusters information were really different from that produced from integration.

My second question: If we do just merge or integration or not should consider our own purpose, could some body give me better merge or integration methods to see the differential gene expression ?

For example, at the merge step, I merge them together by one step and then follow the basic analysis workflow. Shall I merge them together by groups?

I know there are many tutorial information in details but I really hope somebody could help me on my questions.

Thank you in advance.

Seurat • 593 views

ADD COMMENT • link updated 5 months ago by ATpoint 82k • written 6 months ago by diqixiaoyaoer ▴ 10

0

Entering edit mode

Anyone here?

ADD REPLY • link 6 months ago by diqixiaoyaoer ▴ 10

1

Entering edit mode

It's not an on-demand service here. People will respond if they feel like having an answer, it's not appropriate to make comments like that 2h after posting a question.

ADD REPLY • link 6 months ago by ATpoint 82k

score 0 · Answer 1 · 2023-11-09

My question is if I need to to do integration to remove batch effect based on my original purpose (to see differential gene expression between treatment and control)?

Differential expression itself 1does not care about integration as integration is a per-cell process that tries to remove the effect in PCA (or similar) space. The corrected per-gene values are not suitable for DE and should only be used to run per-cell analysis such as mentioned PCA, trajectory etc.

For DE you might want to include donor into your design such as donor + group as donor can be a confounder. Run PCA (which is part of standard workflows anyway) on the non-integrated data to see whether it confounds analysis.

The question is what you want to compare. If you first need to identify a certain cluster/celltype/subset in your data then you might need integration because donor/batch/day/whatever might confound the clustering. This cannot really be answered with the given information.