Kallisto-Bustools Output prpblems
Entering edit mode
3 months ago

I used Kallisto-Bustools for quantifying a snRNA-seq data, the output gives gene names, most tutorial seemed to have gene symbol.

First 10 lines of t2g.txt input file:

  1. ENST00000456328.2 ENSG00000223972.5 DDX11L1
  2. ENST00000450305.2 ENSG00000223972.5 DDX11L1
  3. ENST00000488147.1 ENSG00000227232.5 WASH7P
  4. ENST00000619216.1 ENSG00000278267.1 MIR6859-1
  5. ENST00000473358.1 ENSG00000243485.5 MIR1302-2HG
  6. ENST00000469289.1 ENSG00000243485.5 MIR1302-2HG
  7. ENST00000607096.1 ENSG00000284332.1 MIR1302-2
  8. ENST00000417324.1 ENSG00000237613.2 FAM138A
  9. ENST00000461467.1 ENSG00000237613.2 FAM138A
  10. ENST00000606857.1 ENSG00000268020.3 OR4G4P

First 10 line of genes.txt output file:

  1. ENSG00000001460.18
  2. ENSG00000001461.17
  3. ENSG00000010072.16
  4. ENSG00000008118.10
  5. ENSG00000009780.15
  6. ENSG00000048707.15
  7. ENSG00000034971.17
  8. ENSG00000059588.10
  9. ENSG00000041988.15
  10. ENSG00000049245.13

a. Would swapping the location of gene name and symbol give valid gene name(actually correct names) in the output? b. Any other possible solutions other than manually changing the names. Thanks for your time

Bustools Kallisto • 541 views
Entering edit mode
3 months ago
dsull ★ 5.8k
  1. In kb-python (around version 0.27.3 or so), you could run kb count with the --gene-names options to get gene names instead of gene IDs.

  2. Yes, swapping the gene name and symbol in the t2g.txt file would work -- be careful if you do so, because if there's an empty field for gene name (i.e. the gene ID doesn't actually have a corresponding gene name associated with it), errors will arise.

  3. In any case, you could use R or python to convert your genes.txt gene IDs into the corresponding gene names based on the t2g.txt file.

(Side note: There's a new version of kb-python (version 0.28.0) out where a gene names file is automatically outputted by default (but that requires upgrading your index and several other things -- this new version makes improvements to memory and accuracy).)

Entering edit mode

I installed version 0.28.0, but index cannot be build properly(extremely small files, no errors). I downloaded prebuilt-index to run the program but it threw the following error:

kb count -i index.idx -g t2g.txt -c1 cdna_t2c.txt -c2 intron_t2c.txt -x 10xv3 -o output -t 4 --workflow nucleus /mnt/d/Raw/GSE219280/GSM6781917/SRR22512224/SRR22512224_1.fastq /mnt/d/Raw/GSE219280/GSM6781917/SRR22512224/SRR22512224_2.fastq

usage: kb [-h] [--list] <CMD> ... kb: error: --sum incompatible with lamanno/nucleus

I used awk to create a new file with the corresponding gene symbols, it's satisfactory. By default version 0.26.0 was installed, I had to create new conda environment for upgrading to 0.27.3. Provided good increase in speed.

Thank you very much, have a great day.

Entering edit mode

This is as I said -- there are many changes you need to make in order to use kb-python 0.28.0. For one, the --workflow nucleus no longer is supported. We describe the usage of 0.28.0 in a new preprint.

If you already have 0.27.3 working, just go with that.


Login before adding your answer.

Traffic: 1782 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6