Hisat2 build command gets stuck while executing.
1
0
Entering edit mode
4.0 years ago

My task is to map some Arabidopsis thaliana genome (Fastq files to a reference genome) using Hisat2. I have downloaded the reference fasta file and the annotation GTF file from ENSEMBL website.

First I run the python codes to extract exons and splices. The commands are as follows.

python hisat2_extract_splice_sites.py File.gtf > splices.tsv

python hisat2_extract_exons.py file.gtf > exons.tsv

Then I run the hisat2 build command. The command is as follows.

hisat2-build --ss splices.tsv --exon exons.tsv Fastafile.fa Some_name_to_build_index

When I run this script, it executes for some time and then it just gets stuck. The slurm log shows execution upto below step. I can see data in some of the ".ht2" files. I want to know how I can changes the command to make this run and complete successfully.

Total RAM available: 750 GB

Max total RAM used by my command: 160 GB

My command is using one full core in the node.

I tried giving -p 4 in the command. With this the total memory used was 160*4= 640 GB. Still the job was stuck at the same step below. Any help is much appreciated. I am totally new to this field. Thank you!

  Output files: "arabidopsisz.*.ht2"
  Line rate: 7 (line is 128 bytes)
  Lines per side: 1 (side is 128 bytes)
  Offset rate: 4 (one in 16)
  FTable chars: 10
  Strings: unpacked
  Local offset rate: 3 (one in 8)
  Local fTable chars: 6
  Local sequence length: 57344
  Local sequence overlap between two consecutive indexes: 1024
  Endianness: little
  Actual local endianness: little
  Sanity checking: disabled
  Assertions: disabled
  Random seed: 0
  Sizeofs: void*:8, int:4, long:8, size_t:8

Input files DNA, FASTA:

  /home5/nrpandes/from_chris/bowtie1/Arabidopsis_thaliana.TAIR10.dna_rm.toplevel.fa

Reading reference sizes

  Time reading reference sizes: 00:00:02

Calculating joined length
Writing header
Reserving space for joined string
Joining reference sequences
  Time to join reference sequences: 00:00:01
  Time to read SNPs and splice sites: 00:00:18
    is not reverse-deterministic, so reverse-determinize...

Generation 0 (1 -> 1 nodes, 0 ranks)

COUNTED NEW NODES: 0
COUNTED TEMP NODES: 0
RESIZED NODES: 0
RESIZED NODES: 0
MADE NEW NODES: 0

Generation 1 (1 -> 1 nodes, 0 ranks)

COUNTED NEW NODES: 0
COUNTED TEMP NODES: 0
RESIZED NODES: 0
RESIZED NODES: 0
MADE NEW NODES: 0

Generation 2 (1 -> 1 nodes, 0 ranks)

COUNTED NEW NODES: 0
COUNTED TEMP NODES: 0
RESIZED NODES: 0
RESIZED NODES: 0
MADE NEW NODES: 0

Generation 3 (1 -> 1 nodes, 0 ranks)

BUILT FROM_INDEX: 0
COUNTED NEW NODES: 0
COUNTED TEMP NODES: 0
RESIZED NODES: 0
RESIZED NODES: 0
MADE NEW NODES: 0
RESIZE NODES: 0
SORT NODES: 0
MERGE, UPDATE RANK: 0

Generation 4 (1 -> 1 nodes, 1 ranks)
RNA-Seq alignment • 1.5k views
ADD COMMENT
1
Entering edit mode

What does stuck mean? Did you check with top if it is still running, or is the tool crashing and throwing errors?

ADD REPLY
0
Entering edit mode

The tool is not crashing. It just stays there for hours (20 hrs) without progressing. It doesn't give me any more logs. It doesn't update any ".ht2" files as well.

I Ran the TOP command to notice that the job is running and using 100% of one core. But no parameter changes with time. Here is a screenshot of my TOP output

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND

16778 nrpandes  20   0 22.606g 0.022t   3620 R 100.0  2.9 516:04.89 hisat2-build-s
ADD REPLY
0
Entering edit mode

%CPU is 100, status is Running so it is running. Wait till finished or restart with more cores to accelerate the process.

ADD REPLY
1
Entering edit mode

Please use the formatting bar (especially the code option) to present your post better. You can use backticks for inline code (`text` becomes text), or select a chunk of text and use the highlighted button to format it as a code block. I've done it for you this time.
code_formatting

ADD REPLY
0
Entering edit mode

Sure. Thank you for the suggestion and edit. I will use that in my future posts. Thank you!

ADD REPLY
0
Entering edit mode
4.0 years ago

The issue has been solved. I changed my FASTA file. I was using masked FASTA file. Changed to unmasked. The command executed smoothly. I would appreciate if the experts here could explain me the reason.

ADD COMMENT

Login before adding your answer.

Traffic: 2794 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6