Question: Hisat2 build command gets stuck while executing.
0
gravatar for niranjanpandeshwar
4 months ago by
niranjanpandeshwar0 wrote:

My task is to map some Arabidopsis thaliana genome (Fastq files to a reference genome) using Hisat2. I have downloaded the reference fasta file and the annotation GTF file from ENSEMBL website.

First I run the python codes to extract exons and splices. The commands are as follows.

python hisat2_extract_splice_sites.py File.gtf > splices.tsv

python hisat2_extract_exons.py file.gtf > exons.tsv

Then I run the hisat2 build command. The command is as follows.

hisat2-build --ss splices.tsv --exon exons.tsv Fastafile.fa Some_name_to_build_index

When I run this script, it executes for some time and then it just gets stuck. The slurm log shows execution upto below step. I can see data in some of the ".ht2" files. I want to know how I can changes the command to make this run and complete successfully.

Total RAM available: 750 GB

Max total RAM used by my command: 160 GB

My command is using one full core in the node.

I tried giving -p 4 in the command. With this the total memory used was 160*4= 640 GB. Still the job was stuck at the same step below. Any help is much appreciated. I am totally new to this field. Thank you!

  Output files: "arabidopsisz.*.ht2"
  Line rate: 7 (line is 128 bytes)
  Lines per side: 1 (side is 128 bytes)
  Offset rate: 4 (one in 16)
  FTable chars: 10
  Strings: unpacked
  Local offset rate: 3 (one in 8)
  Local fTable chars: 6
  Local sequence length: 57344
  Local sequence overlap between two consecutive indexes: 1024
  Endianness: little
  Actual local endianness: little
  Sanity checking: disabled
  Assertions: disabled
  Random seed: 0
  Sizeofs: void*:8, int:4, long:8, size_t:8

Input files DNA, FASTA:

  /home5/nrpandes/from_chris/bowtie1/Arabidopsis_thaliana.TAIR10.dna_rm.toplevel.fa

Reading reference sizes

  Time reading reference sizes: 00:00:02

Calculating joined length
Writing header
Reserving space for joined string
Joining reference sequences
  Time to join reference sequences: 00:00:01
  Time to read SNPs and splice sites: 00:00:18
    is not reverse-deterministic, so reverse-determinize...

Generation 0 (1 -> 1 nodes, 0 ranks)

COUNTED NEW NODES: 0
COUNTED TEMP NODES: 0
RESIZED NODES: 0
RESIZED NODES: 0
MADE NEW NODES: 0

Generation 1 (1 -> 1 nodes, 0 ranks)

COUNTED NEW NODES: 0
COUNTED TEMP NODES: 0
RESIZED NODES: 0
RESIZED NODES: 0
MADE NEW NODES: 0

Generation 2 (1 -> 1 nodes, 0 ranks)

COUNTED NEW NODES: 0
COUNTED TEMP NODES: 0
RESIZED NODES: 0
RESIZED NODES: 0
MADE NEW NODES: 0

Generation 3 (1 -> 1 nodes, 0 ranks)

BUILT FROM_INDEX: 0
COUNTED NEW NODES: 0
COUNTED TEMP NODES: 0
RESIZED NODES: 0
RESIZED NODES: 0
MADE NEW NODES: 0
RESIZE NODES: 0
SORT NODES: 0
MERGE, UPDATE RANK: 0

Generation 4 (1 -> 1 nodes, 1 ranks)
rna-seq alignment • 231 views
ADD COMMENTlink modified 3 months ago • written 4 months ago by niranjanpandeshwar0
1

What does stuck mean? Did you check with top if it is still running, or is the tool crashing and throwing errors?

ADD REPLYlink modified 4 months ago • written 4 months ago by ATpoint36k

The tool is not crashing. It just stays there for hours (20 hrs) without progressing. It doesn't give me any more logs. It doesn't update any ".ht2" files as well.

I Ran the TOP command to notice that the job is running and using 100% of one core. But no parameter changes with time. Here is a screenshot of my TOP output

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND

16778 nrpandes  20   0 22.606g 0.022t   3620 R 100.0  2.9 516:04.89 hisat2-build-s
ADD REPLYlink modified 4 months ago • written 4 months ago by niranjanpandeshwar0

%CPU is 100, status is Running so it is running. Wait till finished or restart with more cores to accelerate the process.

ADD REPLYlink modified 4 months ago • written 4 months ago by ATpoint36k
1

Please use the formatting bar (especially the code option) to present your post better. You can use backticks for inline code (`text` becomes text), or select a chunk of text and use the highlighted button to format it as a code block. I've done it for you this time.
code_formatting

ADD REPLYlink written 4 months ago by RamRS28k

Sure. Thank you for the suggestion and edit. I will use that in my future posts. Thank you!

ADD REPLYlink modified 4 months ago • written 4 months ago by niranjanpandeshwar0
0
gravatar for niranjanpandeshwar
3 months ago by
niranjanpandeshwar0 wrote:

The issue has been solved. I changed my FASTA file. I was using masked FASTA file. Changed to unmasked. The command executed smoothly. I would appreciate if the experts here could explain me the reason.

ADD COMMENTlink written 3 months ago by niranjanpandeshwar0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1799 users visited in the last hour