CellRanger output more cells than specified using --force-cells? Why?
2
0
Entering edit mode
13 months ago

Hi

I have a query. I am trying to align my Plasmodium scrnaseq data against combined reference genomes of Human and PF3D7. Since these cells are from ring stage, the number of genes expressed is really low (25-50 genes expressed per cell on an average) and so are the UMIs. This matches with the Neutrophil's case study explained by 10X here. I am however, preplexed, when I force the number of cells to be 10k I get 17K cells as an output. Though after removal of human genes and cells according to following filtering criteria (CreateSeuratObject(counts = counts(seurat_data), project = name, min.cells = 3, min.features = 10)), I get around 9K cells vs 3K gene matrix, I am worried if something wrong might be happening when the cells are estimated when aligning to multiple genomes?

enter image description here

scRNA-seq cellranger • 2.1k views
ADD COMMENT
1
Entering edit mode

The cellular barcode detection happens independent of the genome(s) you use, based on my understanding. It is basically counting how often each barcode is detected, and then the knee method is used to decide if the barcodes are likely to be real (because detected frequently) or rather due to noise. If you use the force option then you overwrite all of this. Given the discrepancy between detected (by knee) and forced method, I think you are counting a great deal of noisy (=artificial) cells/barcodes. I cannot comment on specifics with Plasmodium or these types of organisms in general, but with 50 genes per cell I wonder what usability you have for the data.

I would check if these 9k cells with 3k gene matrix yields anything substantial, or whether this is just random counts across a lot of artificially counted cells. Thinking aloud here.

ADD REPLY
0
Entering edit mode

Hi ATpoint thanks for your input. Yes I was skeptical about the data too in the beginning but it is what it is and I have been told that at the time-point at which data was harvested , the parasite shows really low number of gene expression. So this behavior's expected.

ADD REPLY
0
Entering edit mode
13 months ago
bk11 ★ 3.0k

Did you run cellranger reanalyze using cellranger output of first run with --force-cell=10000 option? Please see in the link below- https://support.10xgenomics.com/single-cell-gene-expression/software/pipelines/latest/using/reanalyze

This section particularly, enter image description here

ADD COMMENT
0
Entering edit mode

Hi @bk11. Thanks for your response. When I go the link you shared, I see this

The cellranger reanalyze command reruns secondary analysis performed on the feature-barcode matrix (dimensionality reduction, clustering, and visualization) using different parameter settings.

So apparently, running reanalyze will rerun the secondary analysis rather than cell detection. Also I am not sure how it explains why I get more cells than I specify using --force-cells parameter?

ADD REPLY
0
Entering edit mode

I think the answer to my query is present in second paragraph of the post here

ADD REPLY
0
Entering edit mode
4 months ago
scideas ▴ 30

I actually had this same issue with CellRanger v7- turns out that since you have 2 species in your mapping reference, CellRanger is forcing 10K for EACH species, resulting in 20K cells showing up in the barcode-rank plot. You can see that there is then a subtraction of 1,824 cells from this value for the "Estimated Number of Cells" output in the top field, likely due to multiplet inference with their standard mixed-species analysis. Really odd behavior if you ask me- seems like a bug.

ADD COMMENT

Login before adding your answer.

Traffic: 1812 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6