My task is to map some Arabidopsis thaliana genome (Fastq files to a reference genome) using Hisat2. I have downloaded the reference fasta file and the annotation GTF file from ENSEMBL website.
First I run the python codes to extract exons and splices. The commands are as follows.
python hisat2_extract_splice_sites.py File.gtf > splices.tsv
python hisat2_extract_exons.py file.gtf > exons.tsv
Then I run the hisat2 build command. The command is as follows.
hisat2-build --ss splices.tsv --exon exons.tsv Fastafile.fa Some_name_to_build_index
When I run this script, it executes for some time and then it just gets stuck. The slurm log shows execution upto below step. I can see data in some of the ".ht2" files. I want to know how I can changes the command to make this run and complete successfully.
Total RAM available: 750 GB
Max total RAM used by my command: 160 GB
My command is using one full core in the node.
I tried giving -p 4
in the command. With this the total memory used was 160*4= 640 GB. Still the job was stuck at the same step below. Any help is much appreciated. I am totally new to this field. Thank you!
Output files: "arabidopsisz.*.ht2"
Line rate: 7 (line is 128 bytes)
Lines per side: 1 (side is 128 bytes)
Offset rate: 4 (one in 16)
FTable chars: 10
Strings: unpacked
Local offset rate: 3 (one in 8)
Local fTable chars: 6
Local sequence length: 57344
Local sequence overlap between two consecutive indexes: 1024
Endianness: little
Actual local endianness: little
Sanity checking: disabled
Assertions: disabled
Random seed: 0
Sizeofs: void*:8, int:4, long:8, size_t:8
Input files DNA, FASTA:
/home5/nrpandes/from_chris/bowtie1/Arabidopsis_thaliana.TAIR10.dna_rm.toplevel.fa
Reading reference sizes
Time reading reference sizes: 00:00:02
Calculating joined length
Writing header
Reserving space for joined string
Joining reference sequences
Time to join reference sequences: 00:00:01
Time to read SNPs and splice sites: 00:00:18
is not reverse-deterministic, so reverse-determinize...
Generation 0 (1 -> 1 nodes, 0 ranks)
COUNTED NEW NODES: 0
COUNTED TEMP NODES: 0
RESIZED NODES: 0
RESIZED NODES: 0
MADE NEW NODES: 0
Generation 1 (1 -> 1 nodes, 0 ranks)
COUNTED NEW NODES: 0
COUNTED TEMP NODES: 0
RESIZED NODES: 0
RESIZED NODES: 0
MADE NEW NODES: 0
Generation 2 (1 -> 1 nodes, 0 ranks)
COUNTED NEW NODES: 0
COUNTED TEMP NODES: 0
RESIZED NODES: 0
RESIZED NODES: 0
MADE NEW NODES: 0
Generation 3 (1 -> 1 nodes, 0 ranks)
BUILT FROM_INDEX: 0
COUNTED NEW NODES: 0
COUNTED TEMP NODES: 0
RESIZED NODES: 0
RESIZED NODES: 0
MADE NEW NODES: 0
RESIZE NODES: 0
SORT NODES: 0
MERGE, UPDATE RANK: 0
Generation 4 (1 -> 1 nodes, 1 ranks)
What does
stuck
mean? Did you check withtop
if it is still running, or is the tool crashing and throwing errors?The tool is not crashing. It just stays there for hours (20 hrs) without progressing. It doesn't give me any more logs. It doesn't update any ".ht2" files as well.
I Ran the TOP command to notice that the job is running and using 100% of one core. But no parameter changes with time. Here is a screenshot of my TOP output
%CPU is 100, status is Running so it is running. Wait till finished or restart with more cores to accelerate the process.
Please use the formatting bar (especially the
code
option) to present your post better. You can use backticks for inline code (`text` becomestext
), or select a chunk of text and use the highlighted button to format it as a code block. I've done it for you this time.Sure. Thank you for the suggestion and edit. I will use that in my future posts. Thank you!