Question: Should kallisto take hours to run?
1
gravatar for Kristin Muench
20 months ago by
United States
Kristin Muench470 wrote:

Hello,

I'm trying to align some test .fastq files to a reference using kallisto.

The reference is composed of human transcriptome + a couple of plasmid sequences (~12000 characters) stored in .fa format. I generated the index using this command:

humFa="/path/to/ucsc/fasta/files/RefGenomes/H_sapiens/hg19/*.fa"
plasFa="/path/to/plasmid/fasta/files/*.fa"

kallisto index -i humPlas_kallisto_transcripts.idx $humFa $plasFa --make-unique

...and the resulting file is. 70.49 GB.

I have tried to align paired end .fastq files to this index using kallisto, but I keep running into issues:

On my Mac laptop (macOS 10.13.3, 3.5 GHz Processor, 16 GB memory):

The issue seems to be that the program dies prematurely, but I don't know why. I run this with only Terminal open, and I don't touch anything while it's running:

./kallisto quant -i ~/Desktop/humPlas_kallisto_transcripts.idx -o ~/Desktop/kallOutput/ -b 100 ~/Desktop/tstFastq/R1.fastq ~/Desktop/tstFastqR2.fastq

[quant] fragment length distribution will be estimated from the data
[index] k-mer length: 31
[index] number of targets: 50,798
[index] number of k-mers: 2,969,625,638
Killed: 9

On a computational cluster:

I run the same command (as above):

kallisto quant -i $kallistoIdx -o $outputFileLoc -t 4 -b 100 $Read1 $Read2

with the following cluster queue settings (SLURM):

#SBATCH --nodes=4
#SBATCH --ntasks=4
#SBATCH --cpus-per-task=4
#SBATCH --mem=256G

...and then the job auto-aborts after six hours because it hasn't completed in that time.

Am I correct in thinking this kallisto alignment is taking suspiciously long/is being suspiciously buggy? Has anyone run into either of these issues? Am I missing anything that might be making kallisto slower?

Thank you for your help!

EDIT - SOLUTION

Thanks for the feedback - indeed, the problem is that I was using UCSC's genome files, not transcriptome.

I got the transcriptome corresponding to hg19 from the Ensembl archive here: wget ftp://ftp.ensembl.org/pub/release-67/fasta/homo_sapiens/cdna/Homo_sapiens.GRCh37.67.cdna.all.fa.gz

Then I regenerated the index using that file in my $humFa path. The resulting index was much smaller.

Now everything is working well, and I can align to my plasmids+human genome. Appreciate the help!

rna-seq kallisto • 1.4k views
ADD COMMENTlink modified 20 months ago • written 20 months ago by Kristin Muench470
4
gravatar for h.mon
20 months ago by
h.mon28k
Brazil
h.mon28k wrote:

The reference is composed of human genome + a couple of plasmid sequences (~12000 characters) stored in .fa format.

kallisto uses a reference transcriptome.

ADD COMMENTlink written 20 months ago by h.mon28k

Oops, I mistyped - you are correct. Modifying original post

ADD REPLYlink written 20 months ago by Kristin Muench470

Ah! I think I see your point. I thought the fasta files from hg19 represented a human transcriptome, but I see they represent a reference genome: https://www.ncbi.nlm.nih.gov/assembly/GCF_000001405.13/

I'll re-download a human reference transcriptome and try again. Thank you!

ADD REPLYlink written 20 months ago by Kristin Muench470
1

If your plasmids have transcribed and non-transcribed parts, or multiple genes, you may create their "transcriptome" fasta using gffread from Stringtie, using the orginal fasta+gtf.

ADD REPLYlink written 20 months ago by h.mon28k

If an answer was helpful you should upvote it, if the answer resolved your question you should mark it as accepted. Upvote|Bookmark|Accept

ADD REPLYlink written 20 months ago by WouterDeCoster42k
1
gravatar for Hussain Ather
20 months ago by
Hussain Ather940
National Institutes of Health, Bethesda, MD
Hussain Ather940 wrote:

Kallisto has taken a few hours to run before. It should be fine.

ADD COMMENTlink written 20 months ago by Hussain Ather940
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 995 users visited in the last hour