Hello, I am attempting to create an new index from Emsemble reference files, and the index builder is taking far longer than what I am used to when creating a new index. The builder command has been running now for >48 hrs and I am a bit confused on why it is taking so long/if it is working.
I am running:
hisat2-build -p 6 --ss /path/to/CanFam3.1.97_intron.bed --exon /path/to/CanFam3.1.97_exonsFile.table -f /path/to/Canis_familiaris.CanFam3.1.dna.toplevel.fa CanFam3.1.97
And the output I have gotten from this run so far is: Settings: Output files: "CanFam3.1.97.*.ht2" Line rate: 7 (line is 128 bytes) Lines per side: 1 (side is 128 bytes) Offset rate: 4 (one in 16) FTable chars: 10 Strings: unpacked Local offset rate: 3 (one in 8) Local fTable chars: 6 Local sequence length: 57344 Local sequence overlap between two consecutive indexes: 1024 Endianness: little Actual local endianness: little Sanity checking: disabled Assertions: disabled Random seed: 0 Sizeofs: void*:8, int:4, long:8, size_t:8 Input files DNA, FASTA: /scratch/clove/canids/Reference/Genome/Ensemble/Canis_familiaris.CanFam3.1.dna.toplevel.fa Reading reference sizes Time reading reference sizes: 00:00:17 Calculating joined length Writing header Reserving space for joined string Joining reference sequences Time to join reference sequences: 00:00:13 But it has been on this last 'Time to join reference sequences' for >12 hrs. The .fa file appears to be formatted correctly: >1 dna:chromosome chromosome:CanFam3.1:1:1:122678785:1 REF NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNTATGTGAGAAGATAGCTGAA CGCCTTGTCCACATCATCTTACTGCTGAGAGTTGAGCTCACCCTCAGTCCCTCACAGTTC CACACTGCCTGCAGAGTGAGTTTCCCATGTCTTCACCAGAGACTTTTGCCAGAGGCTTCT GAGACGCAAGTTAACAATGCAGACCTGGAGGGTATCTCCAGGTGCAGTAGAGTGGTAATC TCGGAACCTCCTGACTCAGAATACTGCTACCTTCACACTGTCATAAGAATGCAGCGAGTT GAGAGCTGGCTTCTAGGCATGCTTCCTTTTGAGAGCTGAGGACAGGACAGAACCCTCCCG CATCCTGCCTGACTGTAGACGTACCTGCTAACCTCCTCATGTTAGTGGCTGGGATAGATT GTGGGAAAAGCATGTGTAAGCATTGGGCCTGAACTCCCGTGTATCTGAGTTGAATACAGC As does the gtf file that the intron and exon files were created from: X ensembl gene 1575 5716 . + . gene_id "ENSCAFG00000010935"; gene_version "3"; gene_source "ensembl"; gene_biotype "protein_coding"; X ensembl transcript 1575 5716 . + . gene_id "ENSCAFG00000010935"; gene_version "3"; transcript_id "ENSCAFT00000017396"; transcript_version "3"; gene_source "ensembl"; gene_biotype "protein_coding"; transcript_source "ensembl"; transcript_biotype "protein_coding";
Can anyone help me determine why this index is taking far more time to run than when I have created them in the past?
Thank you for your help!