Question: Hisat2 index builder seems to be running indefinitly
gravatar for caranlove
15 months ago by
caranlove10 wrote:

Hello, I am attempting to create an new index from Emsemble reference files, and the index builder is taking far longer than what I am used to when creating a new index. The builder command has been running now for >48 hrs and I am a bit confused on why it is taking so long/if it is working.

I am running: hisat2-build -p 6 --ss /path/to/CanFam3.1.97_intron.bed --exon /path/to/CanFam3.1.97_exonsFile.table -f /path/to/Canis_familiaris.CanFam3.1.dna.toplevel.fa CanFam3.1.97

And the output I have gotten from this run so far is:

  Output files: "CanFam3.1.97.*.ht2"
  Line rate: 7 (line is 128 bytes)
  Lines per side: 1 (side is 128 bytes)
  Offset rate: 4 (one in 16)
  FTable chars: 10
  Strings: unpacked
  Local offset rate: 3 (one in 8)
  Local fTable chars: 6
  Local sequence length: 57344
  Local sequence overlap between two consecutive indexes: 1024
  Endianness: little
  Actual local endianness: little
  Sanity checking: disabled
  Assertions: disabled
  Random seed: 0
  Sizeofs: void*:8, int:4, long:8, size_t:8
Input files DNA, FASTA:
Reading reference sizes
  Time reading reference sizes: 00:00:17
Calculating joined length
Writing header
Reserving space for joined string
Joining reference sequences
  Time to join reference sequences: 00:00:13

But it has been on this last 'Time to join reference sequences' for >12 hrs.
The .fa file appears to be formatted correctly: 

>1 dna:chromosome chromosome:CanFam3.1:1:1:122678785:1 REF

As does the gtf file that the intron and exon files were created from:

X       ensembl gene    1575    5716    .       +       .       gene_id "ENSCAFG00000010935"; gene_version "3"; gene_source "ensembl"; gene_biotype "protein_coding";
X       ensembl transcript      1575    5716    .       +       .       gene_id "ENSCAFG00000010935"; gene_version "3"; transcript_id "ENSCAFT00000017396"; transcript_version "3"; gene_source "ensembl"; gene_biotype "protein_coding"; transcript_source "ensembl"; transcript_biotype "protein_coding";

Can anyone help me determine why this index is taking far more time to run than when I have created them in the past?

Thank you for your help!

hisat2 rna-seq • 367 views
ADD COMMENTlink modified 4 months ago by smg0 • written 15 months ago by caranlove10

Does it still run? You can check with the top command in a new terminl window.

ADD REPLYlink written 15 months ago by ATpoint42k

Yes, it does appear to still be running.

ADD REPLYlink written 15 months ago by caranlove10

Have you solved the problem yet? I have the same problem.

ADD REPLYlink written 4 months ago by smg0
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1115 users visited in the last hour