Kallisto bustools index
6 weeks ago
Shawn • 0

I am trying to build an index for a single nuc experiment using Kallisto, but I was wondering if someone could please help breakdown the following for kb ref

I am a bit confused on what exactly the significance of t2g.txt, cdna_t2c.txt, and intron_t2c.txt are

I am also not 100% sure about the difference between lamanno vs nucleus on the workflow

kb ref -i index.idx -g t2g.txt -f1 cdna.fa -f2 intron.fa \
-c1 cdna_t2c.txt -c2 intron_t2c.txt --workflow lamanno -n 8 \
Mus_musculus.GRCm38.dna.primary_assembly.fa.gz Mus_musculus.GRCm38.98.gtf.gz

6 weeks ago
dsull ★ 2.1k

You want to feed those files into kb count

Briefly, t2g.txt contains the transcripts-to-gene mappings, cdna_t2c.txt contains all the cDNA (spliced) transcripts, and intron_t2c.txt contains all the "intronic" (i.e. unspliced) transcripts.

nucleus is used for single-nucleus data while lamanno is used for RNA velocity. There are subtle differences between the two workflows (e.g. for nucleus, the spliced+unspliced matrices are added up while for RNA velocity, separate matrices are generated that can be fed directly into the velocyto workflow).

Now I am a bit confused.

So when running

kb ref

kb ref -i index.idx -g t2g.txt -f1 cdna.fa -f2 intron.fa \
-c1 cdna_t2c.txt -c2 intron_t2c.txt --workflow lamanno -n 8 \


does the t2g.txt, cdna.fa, intron.fa cdna_t2c.txt, intron_t2c.txt get generated?

One of the reasons I am confused is because I was sent files that were built using the comparative annotation toolkit with a few additional items and I haven't fully made sense of everything.

However, one of the things I am seeing is t2g.txt files, such as cDNA_introns_t2g.txt, introns_t2g.txt, cDNA_t2g.txt, cDNA.fa, introns.fa etc

So, part of me thought these are needed when building the index

Correct, all those files are generated via kb ref. They are not needed for building the index; rather, the index building step generates those files.