Question

Kallisto bustools index

0

Entering edit mode

2.9 years ago

Shawn ▴ 20

I am trying to build an index for a single nuc experiment using Kallisto, but I was wondering if someone could please help breakdown the following for kb ref

I am a bit confused on what exactly the significance of t2g.txt, cdna_t2c.txt, and intron_t2c.txt are

I am also not 100% sure about the difference between lamanno vs nucleus on the workflow

kb ref -i index.idx -g t2g.txt -f1 cdna.fa -f2 intron.fa \
-c1 cdna_t2c.txt -c2 intron_t2c.txt --workflow lamanno -n 8 \
Mus_musculus.GRCm38.dna.primary_assembly.fa.gz Mus_musculus.GRCm38.98.gtf.gz

single-nuc Kallisto • 2.2k views

ADD COMMENT • link updated 19 months ago by dsull ★ 5.8k • written 2.9 years ago by Shawn ▴ 20

0

Entering edit mode

Hi there,

First of all thank you for openning this issue, it helped me better understand the nature of the command parameters. But here is my issue :

I am buiding a mouse index using kb ref with the following command line :

kb ref -i index_mm_98.idx -g t2g.txt -f1 /home/younsi/cdna.fa -f2 ./introns.fa -c1 cDNA_t2c.txt -c2 introns_t2c.txt --workflow=lamanno ./Mus_musculus.GRCm38.cdna.all.fa ./Mus_musculus.GRCm38.98.gtf --overwrite

In my case, after running the kb ref, t2g.txt, cDNA_t2c.txt and introns_t2c.txt files are... EMPTY. As I look back at the kb --help, I understand that all of these parameters are supposed to be generated :

required arguments:

  -i INDEX              Path to the kallisto index **to be constructed.**

  -g T2G                Path to transcript-to-gene mapping **to be generated**

  -f1 FASTA             [Optional with -d] Path to the cDNA FASTA (lamanno, nucleus) or mismatch FASTA (kite) **to be generated**


required arguments for `lamanno` and `nucleus` workflows:

  -f2 FASTA             Path to the intron FASTA **to be generated**

  -c1 T2C               Path **to generate** cDNA transcripts-to-capture

  -c2 T2C               Path **to generate** intron transcripts-to-capture

There is something very unclear to me, what could I be doing wrong ? How did you solve you problem GenoMax ?

Thanks in advance for your help

ADD REPLY • link updated 19 months ago by GenoMax 141k • written 19 months ago by liliayounsi • 0

0

Entering edit mode

Please create a new question rather than posting this as an answer to an existing question.

ADD REPLY • link 19 months ago by dsull ★ 5.8k

0

Entering edit mode

Also, you cross-posted here: https://github.com/pachterlab/kallistobustools/issues/44

(and I answered there)

ADD REPLY • link 19 months ago by dsull ★ 5.8k

GenoMax · Answer 1 · 2021-06-12

1

Entering edit mode

2.9 years ago

dsull ★ 5.8k

You want to feed those files into kb count

Briefly, t2g.txt contains the transcripts-to-gene mappings, cdna_t2c.txt contains all the cDNA (spliced) transcripts, and intron_t2c.txt contains all the "intronic" (i.e. unspliced) transcripts.

nucleus is used for single-nucleus data while lamanno is used for RNA velocity. There are subtle differences between the two workflows (e.g. for nucleus, the spliced+unspliced matrices are added up while for RNA velocity, separate matrices are generated that can be fed directly into the velocyto workflow).

ADD COMMENT • link 2.9 years ago by dsull ★ 5.8k

0

Entering edit mode

Now I am a bit confused.

So when running

kb ref

kb ref -i index.idx -g t2g.txt -f1 cdna.fa -f2 intron.fa \
-c1 cdna_t2c.txt -c2 intron_t2c.txt --workflow lamanno -n 8 \

does the t2g.txt, cdna.fa, intron.fa cdna_t2c.txt, intron_t2c.txt get generated?

One of the reasons I am confused is because I was sent files that were built using the comparative annotation toolkit with a few additional items and I haven't fully made sense of everything.

However, one of the things I am seeing is t2g.txt files, such as cDNA_introns_t2g.txt, introns_t2g.txt, cDNA_t2g.txt, cDNA.fa, introns.fa etc

So, part of me thought these are needed when building the index

ADD REPLY • link updated 2.8 years ago by GenoMax 141k • written 2.8 years ago by Shawn ▴ 20

1

Entering edit mode

Correct, all those files are generated via kb ref. They are not needed for building the index; rather, the index building step generates those files.

ADD REPLY • link 2.8 years ago by dsull ★ 5.8k