I have 10x sc RNA-seq data for which I want to do an RNA velocity analysis using scvelo and CellRank. However, the bottleneck for me is generating the spliced/unspliced counts from the aligned data (BAM files or 10x output). I've found two tools that are able to do this:
(1) Velocyto. The original tool for calculating spliced/unspliced counts. The CLI tool is simple to use and has an option
velocyto run10x which should work directly on 10x data. One potential issue with velocyto is that is seems to be no longer updated. The last commits to the git repository are from ~3 years ago (as of 07/2021).
(2) STARsolo. The updated version of STAR has a built-in tool to calculate spliced/unspliced ratios. Basically adding
--soloFeatures Gene Velocyto as option should produce this result. The issue is that the output has a different form (spliced.mtx and unspliced.mtx files) from the .loom files required for scvelo.
Hence neither of the tools is working for me, even after quite some troubleshooting (a few of the error files attached).
(1) How does one convert the output of STARsolo into a form that can be read in python by scvelo (e.g. a loom file like velocyto does)?
(2) Are there are other tools out there, that can generate splicing data for sc-RNAseq data from 10x, which are directly compatible with scvelo?
Velocyto run 10x
I created a conda environment with velocyto and all its dependencies and simply ran
velocyto run10x -vvv --dtype uint32 $DATA10x_PATH $GTF_PATH.
OMP: Info #270: omp_set_nested routine deprecated, please use omp_set_max_active_levels instead.
2021-07-06 18:17:21,016 - ERROR - This is an older version of cellranger, cannot check if the output are ready, make sure of this yourself
2021-07-06 18:17:21,016 - ERROR - Can not locate the barcodes.tsv file!
Traceback (most recent call last):
File "/Users/*/opt/miniconda3/envs2/velocyto/bin/velocyto", line 11, in <module>
File "/Users/*/opt/miniconda3/envs2/velocyto/lib/python3.9/site-packages/click/core.py", line 1137, in __call__
return self.main(*args, **kwargs)
File "/Users/*/opt/miniconda3/envs2/velocyto/lib/python3.9/site-packages/click/core.py", line 1062, in main
rv = self.invoke(ctx)
File "/Users/*/opt/miniconda3/envs2/velocyto/lib/python3.9/site-packages/click/core.py", line 1668, in invoke
File "/Users/*/opt/miniconda3/envs2/velocyto/lib/python3.9/site-packages/click/core.py", line 1404, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/Users/*/opt/miniconda3/envs2/velocyto/lib/python3.9/site-packages/click/core.py", line 763, in invoke
return __callback(*args, **kwargs)
File "/Users/*/opt/miniconda3/envs2/velocyto/lib/python3.9/site-packages/velocyto/commands/run10x.py", line 91, in run10x
bcfile = bcmatches
IndexError: list index out of range
The 10x output (barcodes.tsv.gz, features.tsv.gz, matrix.mtx.gz) are in subfolder
count/sample_feature_bc_matrix, rather than
outs/filtered_gene_bc_matrices as required by velocyto. However, renaming these folders does not solve the problem.
I also tried running velocyto directly on the BAM files, similar to
velocyto run -b filtered_barcodes.tsv -o output_path -m repeat_msk_srt.gtf possorted_genome_bam.bam mm10_annotation.gtf. However, here I get a Permission error with
This gets more complicated with more options, but I was able to run
STAR --genomeDir $GENOME_PATH \
--runThreadN 12 \
--readFilesIn $FASTQ_FILES_1 $FASTQ_FILES_2 \
--outFileNamePrefix $RESULTS_PATH \
--outSAMtype BAM SortedByCoordinate \
--outSAMattributes Standard \
--outFilterType BySJout \
--outFilterMultimapNmax 1 \
--outFilterMismatchNmax 1 \
--outFilterIntronMotifs RemoveNoncanonical \
--outSJfilterOverhangMin 30 20 20 20 \
--outSJfilterCountUniqueMin 3 2 2 2 \
--clipAdapterType CellRanger4 --outFilterScoreMin 30 \
--soloType CB_UMI_Simple \
--soloFeatures Gene Velocyto \
--soloMultiMappers Unique \
--soloCBwhitelist $CB_WHITE_LIST \
--soloUMIlen 12 \
--soloCBmatchWLtype 1MM --soloUMIfiltering MultiGeneUMI_All --soloUMIdedup 1MM_CR \
--soloCellFilter EmptyDrops_CR 3000 0.99 10 45000 90000 5000 0.01 20000 0.01 10000
without problems and generate most of the alignment required. It generated a separate folder for the velocyto files, with files
unspliced.mtx. However, as mentioned I had problems working with these files in scvelo.