Question

Difference in reads per cell between samples

0

Entering edit mode

6 months ago

taylor.pio • 0

Hi all,

I have a Seurat object made of several samples of different conditions. Some of the samples have much more cells loaded than others, and even after resequencing, these samples still have lower read counts per cell than the other samples. We want to ensure that any clustering differences are due to biological differences, and not simply clustering based on an artifact of lower reads per cell. Is there a strategy to help randomly select/remove reads so that all samples have about the same amount, then ensure that the clustering still has biologically relevant changes? We expect that these changes would result in biological differences, however some conditions had We want to run our actual analysis on the UMAP with all read counts, however we want to do a QQC step to ensure clustering differences are not due to read count per cell (UMAP visualization). Thank you for any and all help with this approach. Attached are UMAPs of the four conditions, Feature plots for QC metrics, summaries of reads from html file. This dataset has been pre-processed using Seurat pipeline, filtered for QC, and ran harmony to remove batch effects for cell line.

I see discussions about randomly downsampling and removing cells, but I don't believe this would address my question.

Thank you for any and all help!

enter image description here

seurat scrnaseq • 763 views

ADD COMMENT • link 6 months ago by taylor.pio • 0

score 0 · Answer 1 · 2025-03-04

0

Entering edit mode

6 months ago

Bastien Hervé 6.5k

In single cell, usually, one wants to normalize the number of gene counts given to each cell. As you mentionned Seurat, it can be done with the function NormalizeData. The newly generated matrix will then be used to do PCA, neighbor graph and clustering.

I don't know what are your cells but in real life, some cells are bigger than others, some with a lot of genes expressed, some with only a few so it also make sense if they cluster separetely on your UMAPs.

What I would take care of is potential doublets (potentially in cluster 3), if you see gene markers of multiple cell types coming from the same cell.

Also, try to find which genes are driving each cluster to ensure they correspond to a well known cell type. For example if you don't find any genes specific to your cluster 6, I would be dubious about its faireness but if they have gene markers of a specific cell type you were expecting after sequencing, then it would be legit.

ADD COMMENT • link 6 months ago by Bastien Hervé 6.5k

0

Entering edit mode

Hi Bastien! Thank you for your help. These cells have been normalized already. I agree that cluster 6 is potentially a doublet, and we are running a doublet removal on the dataset.

However, Cluster 0 is quite enlarged in a particular condition, a condition which had many more cells loaded, thus fewer reads per cell. We want to ensure that cluster 0 is not an artifact of this difference in read count, and is indeed biologically different.

Is there a way to subset reads (or randomly delete reads) then regenerate the umap and ensure these differences in clustering still exist?

ADD REPLY • link 6 months ago by taylor.pio • 0

0

Entering edit mode

NormalizeData is already doing this read counts mitigation

Feature counts for each cell are divided by the total counts for that cell and multiplied by the scale.factor

You can try cellranger aggr

When combining data from multiple GEM wells, the cellranger aggr pipeline automatically equalizes the average read depth per cell between groups before merging

https://www.10xgenomics.com/support/software/cell-ranger/latest/analysis/running-pipelines/cr-3p-aggr#depth_normalization