Question

Discrepancy in gene expression between cellranger and velocyto pipeline

0

Entering edit mode

3 months ago

sarang.chafle • 0

Hello everyone! I am hoping to get some help with implementing splicing based cell velocity/trajectory analysis. I have fastq files generated after sequencing scRNA libraries prepared using Universal 5' Gene expression along with VDJ sequencing.

I have produced count matrices using cell ranger to perform some general analysis using Seurat, the quality of the data seems fine. Next, I generated spliced and unspliced count matrices using velocyto's "run10x" command with the reference genome annotation from 10x genomics (refdata-gex-GRCh38-2020-A/genes/genes.gtf). I analyzed the resulting loom file in scanpy. After the normal pre-processing and analysis steps, I wanted to do a sanity check and check for "CD3E" and "CD3D" expression on a UMAP, as T cells are CD3+ and CD4+ but CD4 transcripts are generally not well represented. I was very surprised to see CD3E and CD3D is expressed in very few cells. I tried to check the expression of the same genes in the Seurat and as expected the seurat object shows uniformly high expression of CD3D as well as CD3E.
I suspect the problem lies in the counting part of the pipeline. So I tried to use Star solo as an alternative to velocyto count. But I end up with the same result.

Below are some checks I performed on my anndata object:

"CD3D" in BT6_a.var_names

True

BT6_a[:, "CD3D"].X.sum()

80

As you can see only 80 cells are expressing CD3D. As solutions, I tried multimapping option in Star solo, tried using the updated genome from 10x, but nothing has worked so far. If anyone has implemented velocity/trajectory analysis, I would love to connect. Please let me know what further information I can provide to solve this issue. Thanks!

Plots illustrating the issue

velocyto scvelo seurat scRNA • 546 views

ADD COMMENT • link 3 months ago by sarang.chafle • 0

score 1 · Answer 1 · 2025-07-21

1

Entering edit mode

3 months ago

Rob 7.2k

You might be interested in this paper Quantification of spliced and unspliced transcripts by velocyto is inaccurate for 5’-sequencing data. I’m short, it seems one should avoid the use of velocyto for this type of data.

ADD COMMENT • link 3 months ago by Rob 7.2k

0

Entering edit mode

Hello Rob, Thanks for your reply. The article you provided is interesting and does provide a resolution to part of the problem. Currently I am testing salmon/alevin-fry tutorial for RNA-velocity (alevin-fry-velocity), but the initial observations are same as the velocyto pipeline (CD3D expression being almost non existent).

alevin-fry

Do you have a recommendation of package or pipeline for this particular analysis?

ADD REPLY • link 3 months ago by sarang.chafle • 0