Hello everyone! I am hoping to get some help with implementing splicing based cell velocity/trajectory analysis. I have fastq files generated after sequencing scRNA libraries prepared using Universal 5' Gene expression along with VDJ sequencing.
I have produced count matrices using cell ranger to perform some general analysis using Seurat, the quality of the data seems fine. Next, I generated spliced and unspliced count matrices using velocyto's "run10x" command with the reference genome annotation from 10x genomics (refdata-gex-GRCh38-2020-A/genes/genes.gtf). I analyzed the resulting loom file in scanpy. After the normal pre-processing and analysis steps, I wanted to do a sanity check and check for "CD3E" and "CD3D" expression on a UMAP, as T cells are CD3+ and CD4+ but CD4 transcripts are generally not well represented. I was very surprised to see CD3E and CD3D is expressed in very few cells. I tried to check the expression of the same genes in the Seurat and as expected the seurat object shows uniformly high expression of CD3D as well as CD3E.
I suspect the problem lies in the counting part of the pipeline. So I tried to use Star solo as an alternative to velocyto count. But I end up with the same result.
Below are some checks I performed on my anndata object:
"CD3D" in BT6_a.var_names
True
BT6_a[:, "CD3D"].X.sum()
80
As you can see only 80 cells are expressing CD3D. As solutions, I tried multimapping option in Star solo, tried using the updated genome from 10x, but nothing has worked so far. If anyone has implemented velocity/trajectory analysis, I would love to connect. Please let me know what further information I can provide to solve this issue. Thanks!
Hello Rob, Thanks for your reply. The article you provided is interesting and does provide a resolution to part of the problem. Currently I am testing salmon/alevin-fry tutorial for RNA-velocity (alevin-fry-velocity), but the initial observations are same as the velocyto pipeline (CD3D expression being almost non existent).
Do you have a recommendation of package or pipeline for this particular analysis?