Discrepancy in gene expression between cellranger and velocyto pipeline
1
0
Entering edit mode
6 weeks ago

Hello everyone! I am hoping to get some help with implementing splicing based cell velocity/trajectory analysis. I have fastq files generated after sequencing scRNA libraries prepared using Universal 5' Gene expression along with VDJ sequencing.

I have produced count matrices using cell ranger to perform some general analysis using Seurat, the quality of the data seems fine. Next, I generated spliced and unspliced count matrices using velocyto's "run10x" command with the reference genome annotation from 10x genomics (refdata-gex-GRCh38-2020-A/genes/genes.gtf). I analyzed the resulting loom file in scanpy. After the normal pre-processing and analysis steps, I wanted to do a sanity check and check for "CD3E" and "CD3D" expression on a UMAP, as T cells are CD3+ and CD4+ but CD4 transcripts are generally not well represented. I was very surprised to see CD3E and CD3D is expressed in very few cells. I tried to check the expression of the same genes in the Seurat and as expected the seurat object shows uniformly high expression of CD3D as well as CD3E.
I suspect the problem lies in the counting part of the pipeline. So I tried to use Star solo as an alternative to velocyto count. But I end up with the same result.

Below are some checks I performed on my anndata object:

"CD3D" in BT6_a.var_names

True

BT6_a[:, "CD3D"].X.sum()

80

As you can see only 80 cells are expressing CD3D. As solutions, I tried multimapping option in Star solo, tried using the updated genome from 10x, but nothing has worked so far. If anyone has implemented velocity/trajectory analysis, I would love to connect. Please let me know what further information I can provide to solve this issue. Thanks!

Plots illustrating the issue

velocyto scvelo seurat scRNA • 411 views
ADD COMMENT
1
Entering edit mode
6 weeks ago
Rob 7.1k

You might be interested in this paper Quantification of spliced and unspliced transcripts by velocyto is inaccurate for 5’-sequencing data. I’m short, it seems one should avoid the use of velocyto for this type of data.

ADD COMMENT
0
Entering edit mode

Hello Rob, Thanks for your reply. The article you provided is interesting and does provide a resolution to part of the problem. Currently I am testing salmon/alevin-fry tutorial for RNA-velocity (alevin-fry-velocity), but the initial observations are same as the velocyto pipeline (CD3D expression being almost non existent).

alevin-fry

Do you have a recommendation of package or pipeline for this particular analysis?

ADD REPLY

Login before adding your answer.

Traffic: 2519 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6