Question

Cellranger aggr versus Seurat

1

Entering edit mode

4.0 years ago

swbarnes2 14k

I have 4 samples; two related tissues from two different donors. I ran cellranger count on all four samples, and used cellranger aggr to combine all the data.

Then I gave the filtered matrix data from each sample to Seurat, (not the matrix data from the aggregation) and had it integrate the data.

The 10x aggr method puts each library in its own cluster. Seurat's integration puts all the cells from all the samples into one big cluster.

I was wondering if anyone had observed this before, or if anyone had an idea as to which UMAP is likely to be more reliable. I think that Seurat's algorithm is more sophisticated, but maybe the 10X people understand their data better, and their way is better for their libraries? Is there a way to change my command lines to make the two ways more similar?

10XGenomcs command lines

cellranger count --id=donor1_type1 --fastqs=/projects/Illumina/200310_NB551398_0049_AHCN2KBGXC/mkfastq/outs/fastq_path/HCN2KBGXC/donor1_type1/ --transcriptome=/projects/Illumina/W/10xGenomics/refdata-cellranger-1.1.0/GRCh38_96/GRCh38/ --localcores=30

 cellranger aggr --id=all_200319_aggregate --csv=all_200319_aggr.csv

Seurat R commands, taken from here: https://satijalab.org/seurat/v3.1/immune_alignment.html

data <- Read10X(data.dir = data_dir)
pbmc <- CreateSeuratObject(counts = data, project = "donor1_type1", min.cells = 3, min.features = 200)
pbmc <- NormalizeData(pbmc)
pbmc <- FindVariableFeatures(pbmc, selection.method = "vst", nfeatures = 2000)

immune.anchors <- FindIntegrationAnchors(object.list = list(donor1_type1, donor1_type2, donor2_type1, Donor2_type2), dims = 1:20)
combined.all <- IntegrateData(anchorset = immune.anchors, dims = 1:20)
rm(immune.anchors)
DefaultAssay(combined.all) <- "integrated"
combined.all <- ScaleData(combined.all, verbose = FALSE)
combined.all <- RunPCA(combined.all, npcs = 30, verbose = FALSE)
combined.all <- RunUMAP(combined.all, reduction = "pca", dims = 1:20)
combined.all <- FindNeighbors(combined.all, reduction = "pca", dims = 1:20)
combined.all <- FindClusters(combined.all, resolution = 0.5)

10xGenomics 10XGenomics Seurat single cell • 8.5k views

ADD COMMENT • link updated 4.0 years ago by igor 13k • written 4.0 years ago by swbarnes2 14k

score 1 · Answer 1 · 2020-04-22

1

Entering edit mode

4.0 years ago

igor 13k

I was wondering if anyone had observed this before, or if anyone had an idea as to which UMAP is likely to be more reliable.

This is common. If you have multiple libraries, you will likely observe differences between those libraries in your UMAP unless you have very diverse populations in your libraries.

Is there a way to change my command lines to make the two ways more similar?

You are using integration with Seurat, which tries to reduce potential batch effects and why your samples end up overlapping. If you just merged the objects (merge()), your results would be more similar to Cell Ranger. For Cell Ranger, there is an option to specify batch in your sample sheet, which will make the results more similar to Seurat's integration. There is an official description here (I am not sure why they imply that batch effects are only visible with different library kits): https://kb.10xgenomics.com/hc/en-us/articles/360000559571-How-can-I-remove-batch-effects-among-samples-in-Cell-Ranger-

ADD COMMENT • link 4.0 years ago by igor 13k

0

Entering edit mode

If you have multiple libraries, you will likely observe differences between those libraries in your UMAP unless you have very diverse populations in your libraries.

I expect differences, but I would have hoped that the same tissues would cluster together.

With only 4 samples I'm not sure that batch effect is really relevant. I guess each of the donors would be its own batch. I think these samples were received days apart, they definitely would be using the same library prep kit.

So when would be the right circumstances to use Seurat merge versus Seurat integrate? Is integrate only for vary disparate data sets? Like datasets form different sources?

ADD REPLY • link 4.0 years ago by swbarnes2 14k

0

Entering edit mode

I use integrate most times. It's very rare that you do not see any batch differences between samples. You have to see what makes sense for your experiment. Look at the data both ways and check the expression of important population markers. In the Seurat pancreas integration vignettes, there are a few plots showing UMAPs with and without integration. They are profiling very diverse populations, so the cell types do cluster together, but you can still see segregation within each population based on the library type.

ADD REPLY • link 4.0 years ago by igor 13k