Question: Hisat2 index builder seems to be running indefinitly
0
gravatar for caranlove
11 weeks ago by
caranlove0
caranlove0 wrote:

Hello, I am attempting to create an new index from Emsemble reference files, and the index builder is taking far longer than what I am used to when creating a new index. The builder command has been running now for >48 hrs and I am a bit confused on why it is taking so long/if it is working.

I am running: hisat2-build -p 6 --ss /path/to/CanFam3.1.97_intron.bed --exon /path/to/CanFam3.1.97_exonsFile.table -f /path/to/Canis_familiaris.CanFam3.1.dna.toplevel.fa CanFam3.1.97

And the output I have gotten from this run so far is:

   Settings:
  Output files: "CanFam3.1.97.*.ht2"
  Line rate: 7 (line is 128 bytes)
  Lines per side: 1 (side is 128 bytes)
  Offset rate: 4 (one in 16)
  FTable chars: 10
  Strings: unpacked
  Local offset rate: 3 (one in 8)
  Local fTable chars: 6
  Local sequence length: 57344
  Local sequence overlap between two consecutive indexes: 1024
  Endianness: little
  Actual local endianness: little
  Sanity checking: disabled
  Assertions: disabled
  Random seed: 0
  Sizeofs: void*:8, int:4, long:8, size_t:8
Input files DNA, FASTA:
  /scratch/clove/canids/Reference/Genome/Ensemble/Canis_familiaris.CanFam3.1.dna.toplevel.fa
Reading reference sizes
  Time reading reference sizes: 00:00:17
Calculating joined length
Writing header
Reserving space for joined string
Joining reference sequences
  Time to join reference sequences: 00:00:13


But it has been on this last 'Time to join reference sequences' for >12 hrs.
The .fa file appears to be formatted correctly: 

>1 dna:chromosome chromosome:CanFam3.1:1:1:122678785:1 REF
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNTATGTGAGAAGATAGCTGAA
CGCCTTGTCCACATCATCTTACTGCTGAGAGTTGAGCTCACCCTCAGTCCCTCACAGTTC
CACACTGCCTGCAGAGTGAGTTTCCCATGTCTTCACCAGAGACTTTTGCCAGAGGCTTCT
GAGACGCAAGTTAACAATGCAGACCTGGAGGGTATCTCCAGGTGCAGTAGAGTGGTAATC
TCGGAACCTCCTGACTCAGAATACTGCTACCTTCACACTGTCATAAGAATGCAGCGAGTT
GAGAGCTGGCTTCTAGGCATGCTTCCTTTTGAGAGCTGAGGACAGGACAGAACCCTCCCG
CATCCTGCCTGACTGTAGACGTACCTGCTAACCTCCTCATGTTAGTGGCTGGGATAGATT
GTGGGAAAAGCATGTGTAAGCATTGGGCCTGAACTCCCGTGTATCTGAGTTGAATACAGC

As does the gtf file that the intron and exon files were created from:

X       ensembl gene    1575    5716    .       +       .       gene_id "ENSCAFG00000010935"; gene_version "3"; gene_source "ensembl"; gene_biotype "protein_coding";
X       ensembl transcript      1575    5716    .       +       .       gene_id "ENSCAFG00000010935"; gene_version "3"; transcript_id "ENSCAFT00000017396"; transcript_version "3"; gene_source "ensembl"; gene_biotype "protein_coding"; transcript_source "ensembl"; transcript_biotype "protein_coding";

Can anyone help me determine why this index is taking far more time to run than when I have created them in the past?

Thank you for your help!

hisat2 rna-seq • 112 views
ADD COMMENTlink modified 11 weeks ago by ATpoint25k • written 11 weeks ago by caranlove0

Does it still run? You can check with the top command in a new terminl window.

ADD REPLYlink written 11 weeks ago by ATpoint25k

Yes, it does appear to still be running.

ADD REPLYlink written 11 weeks ago by caranlove0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1986 users visited in the last hour