RNA velocity for smart-seq2 dataset
1
0
Entering edit mode
4 months ago
Dhanusha • 0

I have tried RNA Velocity with alevin for my smart-seq2 dataset, However, the log files show Total 62.3886% of reads will be thrown away because of noisy Cellular barcodes. I can't use kallisto bustools also because the reads are not having UMI barcodes. Any suggestion on how to proceed with RNA velocity for smart-seq2 dataset?

Smart-seq2 velocity scVelo RNA • 989 views
1
Entering edit mode
4 months ago
predeus ★ 1.7k

Smart-seq2 does not have any barcodes, are you sure you're processing it right? STARsolo can process Smart-seq2 datasets now. Trimming adapters with something like bbduk.sh can dramatically improve the mapping rate, since Smart-seq2 often have lots of them.

0
Entering edit mode

That's the concern, since smart-seq2 doesn't have UMI barcodes I am not sure whether it is processing the reads correctly or not. So you are suggesting STARsolo can detect the spliced and unspliced mRNA from smart-seq2 data, I will try this. Thank you!

0
Entering edit mode

FYI, the devel version of kallisto | bustools can process smart-seq2 data (you'll need the devel version of kb-python and the devel version of kallisto). Happy to discuss this further.

0
Entering edit mode

I have tried kallisto|bustools for the smart-seq2, seems kb count is working. How will I proceed further? I have the following out files generated: adata.h5ad, bacth.txt, genes.txt, kb_info.json, matrix.abundance.mtx, matrix.cells, matrix.ec, matrix.fld.tsv, matrix.tcc.mtx, transcripts.txt, run_info.json

0
Entering edit mode
0
Entering edit mode

By the way, the stable version of kallisto | bustools now includes smart-seq2 (no need to use the devel version).

In any case, it seems you you have it working for the standard analysis where introns aren't included in the reference. However, in order to produce an RNA velocity analysis, you must include introns. This means you must run both "kb ref" and "kb count" with the option: --workflow lamanno. Note: Including introns in the index will result in kallisto using a lot of memory so make sure you have lots of ram (~70 gb).

After that, you'll get both "spliced" and "unspliced" matrices. You get a loom file (be sure to specify --loom when running kb count) containing the two, which you can plug into a standard RNA velocity workflow.

0
Entering edit mode

It is giving an error message as SMARTSEQ can not be used with workflow lamanno. Any suggestions?

2
Entering edit mode

Try installing the devel version of kb-python:

pip install git+https://github.com/pachterlab/kb-python@devel

And then try the lamanno workflow

0
Entering edit mode

I was having the same issue as Dhanusha and the devel version of kb-python works fine! thank you dsull

0
Entering edit mode

OK, you're right, that option is currently not available. I'll try to get that fixed when I can.

In the meantime, you can run the workflow without kb-python however it won't be that straightforward.

Here's what you need to do to get spliced/unspliced matrices:

Try running a 10XV3 RNA velocity analysis in kb-python using the --dry-run option. It'll show you a list of commands to get an RNA velocity analysis going.

Use those exact commands for your smart-seq data EXCEPT add -m and --cm whenever running bustools count and ignore the "bustools whitelist" and "bustools correct" commands.

E.g. For the step: "bustools correct -o ./tmp/output.s.c.bus -w ./whitelist.txt ./tmp/output.s.bus", you'll want to instead run "mv ./tmp/output.s.bus ./tmp/output.s.c.bus" (effectively skipping that step).

I'm sorry this isn't that straightforward but hopefully kb-python will be updated to fix this soon.