Building index with kallisto, keeps getting killed.
2
0
Entering edit mode
3 months ago
pubsurfted ▴ 40

Hello,

I have been trying to create a kallisto index using the following command:

kallisto index -i Glycine-Max.idx Glycine_max.Glycine_max_v2.1.cdna.all.fa.gz


It does run but soon encounters a problem:

[build] loading fasta file Glycine_max.Glycine_max_v2.1.cdna.all.fa.gz
[build] k-mer length: 31
[build] warning: clipped off poly-A tail (longer than 10)
from 4 target sequences
[build] warning: replaced 64340 non-ACGUT characters in the input sequence
with pseudorandom nucleotides
[build] counting k-mers ... Killed


What is the cause behind this error? How to fix it?

Edit: I'm using how-are-we-stranded-here that depends on kallisto index. I know there are alternative software to build index, but I'm limited to kallisto.

Thank you for any replies and best wishes.

kallisto • 728 views
1
Entering edit mode

Killed most of the time means you are running out of memory. What is your setup and can you get a computer with more memory?

0
Entering edit mode

I'm a bs student so I currently cannot afford to add more memory to my peasant computer.

3
Entering edit mode
3 months ago

Based on dsull's proposal I have made a notebook that creates the index.

Because the index has already been generated, you can download it directly (Note: you need a google account). However, this will take a while and you might want to consider doing your analysis in colab as a whole. I have added a gzip step, because download from colab could be slow, so you need to gunzip the idx file if you download it like this.

2
Entering edit mode
3 months ago
dsull ★ 4.0k

What version of kallisto? You can check via kallisto --version . Make sure you're using kallisto 0.48.0 (the latest version).

Running the following works just fine on my laptop (a MacBook Pro with 16 gb ram):

wget ftp.ensemblgenomes.org/pub/plants/current/fasta/glycine_max/cdna/Glycine_max.Glycine_max_v2.1.cdna.all.fa.gz
kallisto index -i Glycine-Max.idx Glycine_max.Glycine_max_v2.1.cdna.all.fa.gz


And outputs the following:

[build] loading fasta file Glycine_max.Glycine_max_v2.1.cdna.all.fa.gz
[build] k-mer length: 31
[build] warning: clipped off poly-A tail (longer than 10)
from 4 target sequences
[build] warning: replaced 64340 non-ACGUT characters in the input sequence
with pseudorandom nucleotides
[build] counting k-mers ... done.
[build] building target de Bruijn graph ...  done
[build] creating equivalence classes ...  done
[build] target de Bruijn graph has 1310952 contigs and contains 95464598 k-mers


It takes 5 mins 23 seconds, and peak memory usage is 5.03 gigabytes.

0
Entering edit mode

Hello, Thank you for taking the time to reply to my post.

My goal is to run how-are-we-stranded-here tool, and one of the dependencies of this tool is kallisto version=0.44.0. So I have to use it but it keeps getting stuck at the kallisto indexing step. :(

2
Entering edit mode

That tool should be compatible with indices generated from kallisto 0.48.0 (the latest version). The format of the index has not changed between 0.44.0 and 0.48.0 (although the latest version probably includes some optimizations and bug fixes during the index generation step).

Nonetheless, if you have a computer with less than 6 gb of memory available, why not just run things on google colab? Google colab, which is free, comes with more than enough memory to generate kallisto indices. You can install kallisto on google colab, generate the index using the command supplied above, and then download the index file from google colab.

1
Entering edit mode

Hi dsull, running this in colab is a great idea, but it may require some explanation for someone new to colab.