My task is to map some Arabidopsis thaliana genome (Fastq files to a reference genome) using Hisat2. I have downloaded the reference fasta file and the annotation GTF file from ENSEMBL website.
First I run the python codes to extract exons and splices. The commands are as follows.
python hisat2_extract_splice_sites.py File.gtf > splices.tsv python hisat2_extract_exons.py file.gtf > exons.tsv
Then I run the hisat2 build command. The command is as follows.
hisat2-build --ss splices.tsv --exon exons.tsv Fastafile.fa Some_name_to_build_index
When I run this script, it executes for some time and then it just gets stuck. The slurm log shows execution upto below step. I can see data in some of the ".ht2" files. I want to know how I can changes the command to make this run and complete successfully.
Total RAM available: 750 GB
Max total RAM used by my command: 160 GB
My command is using one full core in the node.
I tried giving
-p 4 in the command. With this the total memory used was 160*4= 640 GB. Still the job was stuck at the same step below. Any help is much appreciated. I am totally new to this field. Thank you!
Output files: "arabidopsisz.*.ht2" Line rate: 7 (line is 128 bytes) Lines per side: 1 (side is 128 bytes) Offset rate: 4 (one in 16) FTable chars: 10 Strings: unpacked Local offset rate: 3 (one in 8) Local fTable chars: 6 Local sequence length: 57344 Local sequence overlap between two consecutive indexes: 1024 Endianness: little Actual local endianness: little Sanity checking: disabled Assertions: disabled Random seed: 0 Sizeofs: void*:8, int:4, long:8, size_t:8 Input files DNA, FASTA: /home5/nrpandes/from_chris/bowtie1/Arabidopsis_thaliana.TAIR10.dna_rm.toplevel.fa Reading reference sizes Time reading reference sizes: 00:00:02 Calculating joined length Writing header Reserving space for joined string Joining reference sequences Time to join reference sequences: 00:00:01 Time to read SNPs and splice sites: 00:00:18 is not reverse-deterministic, so reverse-determinize... Generation 0 (1 -> 1 nodes, 0 ranks) COUNTED NEW NODES: 0 COUNTED TEMP NODES: 0 RESIZED NODES: 0 RESIZED NODES: 0 MADE NEW NODES: 0 Generation 1 (1 -> 1 nodes, 0 ranks) COUNTED NEW NODES: 0 COUNTED TEMP NODES: 0 RESIZED NODES: 0 RESIZED NODES: 0 MADE NEW NODES: 0 Generation 2 (1 -> 1 nodes, 0 ranks) COUNTED NEW NODES: 0 COUNTED TEMP NODES: 0 RESIZED NODES: 0 RESIZED NODES: 0 MADE NEW NODES: 0 Generation 3 (1 -> 1 nodes, 0 ranks) BUILT FROM_INDEX: 0 COUNTED NEW NODES: 0 COUNTED TEMP NODES: 0 RESIZED NODES: 0 RESIZED NODES: 0 MADE NEW NODES: 0 RESIZE NODES: 0 SORT NODES: 0 MERGE, UPDATE RANK: 0 Generation 4 (1 -> 1 nodes, 1 ranks)