Question

Combining multiple 10x scRNAseq datasets

1

Entering edit mode

2.3 years ago

Alex Gibbs ▴ 80

Hi everyone,

Wondering if someone can provide me with some guidance.

I have previously sequenced 4 skin cancers using 10X chemistries and I would like to combine them into one dataset. My research question is to look at cancer stem cell populations, so I will need sensitivity.

I have done a load of reading and I am still confused as to what the correct method is.

CellRanger Aggregate causes a massive downsampling in 2 out of 4 samples. I believe this is becuase the number of cells in each sample varies (~600 - ~2500 cells). If I proceed with the aggregated dataset, I will be losing too much sequencing depth which will hinder my search for the cancer stem cells. Another option is to use CellRanger Aggr's 'normalize=none' argument, which does not normalize the average read depth per cell between samples before merging. From here, could I then go onto use scTransform or Seurat's 'NormalizeData' arguments? I have performed QC for each sample dataset and they all need different filtering parameters (nFeature_RNA), so the only problem with this method would be giving the combined dataset one filter, which could potentially include bad data for some of the samples.

Another option that I could do is filter and normalize each sample individually and then use Seurat's 'merge' function. However, by default, this uses the raw count matrices which will erase the normalized data matrices. There is an option to merge the normalized data matrices as well as the raw data matrices (merge.data = TRUE) but this should only be done if the same normalization approach was applied to all the objects. Does this mean the same normalization method such as 'LogNormalize' or 'CLR', or does it mean the same filters?

I am sorry if this is a disgruntled 'question', I am just looking for the best method to combine my 4 datasets without loosing too much data and then continue with downstream analyses.

Thank you so much in advance!!

Alex

CellRanger Seurat 10X scRNAseq • 3.2k views

ADD COMMENT • link updated 20 months ago by Quang • 0 • written 2.3 years ago by Alex Gibbs ▴ 80

0

Entering edit mode

Unfortunately, there is at this time no "best" method. People are still trying to figure out how best to do this.

ADD REPLY • link 2.3 years ago by swbarnes2 14k

score 2 · Accepted Answer · 2022-01-20

2

Entering edit mode

2.3 years ago

jared.andrews07 ★ 16k

Avoid CellRanger Aggregate.

Using Seurat merge with merge.data = TRUE is fine. It just means the same normalization method, not the same filters. You should define your filters for each sample separately.

ADD COMMENT • link 2.3 years ago by jared.andrews07 ★ 16k

1

Entering edit mode

jared.andrews07 Just wondering as a colleague asked me about CellRanger (which I never used), is aggr in any situation of any advantage rather than loading the individual quantifications into R and then do the merging/normalization? Can you comment? This whole CellRanger thing is so heavy-weight, it's a pain to use.

ADD REPLY • link 23 months ago by ATpoint 82k

1

Entering edit mode

I don't know of any, no. I guess it could save you a step reading stuff in, but I'd rather be explicit about what I'm cramming together and how.

ADD REPLY • link 23 months ago by jared.andrews07 ★ 16k

0

Entering edit mode

Thank you for this, Jared. Much appreciated!!

ADD REPLY • link 2.3 years ago by Alex Gibbs ▴ 80

1

Entering edit mode

Note you may want to look into integration methods depending on the differences in your samples and your end goals. I tend to feel most methods are too heavy handed, personally, but it can be useful in some cases.

ADD REPLY • link 2.3 years ago by jared.andrews07 ★ 16k

0

Entering edit mode

Can you reccommend, in your opinion, the best integration method that may be worth me looking into/trying?

ADD REPLY • link 2.3 years ago by Alex Gibbs ▴ 80

1

Entering edit mode

There a boatload of them and conflicting benchmarking studies. Harmony generally ranks out pretty okay, and I tend to like the reciprocal PCA method, as it's more conservative and allows you to adjust the level of integration. Still waiting for a Bioconductor package to include that method.

ADD REPLY • link 2.3 years ago by jared.andrews07 ★ 16k

0

Entering edit mode

Thank you @jared.andrews07 for the suggestions. I wonder since SCTransform() do correct of sequencing depth on a per-cell basis. How do we account for sequencing depth differences between datasets if we need to run SCTransform() on individual dataset?

In my case, I want to correct for sequencing depth among my 6 datasets described here: https://github.com/satijalab/seurat/issues/6361. I would greatly appreciate it to hear your inputs! Thank you!

ADD REPLY • link 20 months ago by Quang • 0