STAR aligner index generating issues
1
1
Entering edit mode
20 months ago

Hi,

I am running the following command to generate genome index using STAR aligner :

Softwares/STAR/bin/Linux_x86_64/STAR --runThreadN 8 --runMode genomeGenerate --genomeDir /scratch/urangasw/data/hg38_index --genomeFastaFiles /home/urangasw/data/genome/human/Homo_sapiens.GRCh38.dna.primary_assembly.fa --sjdbGTFfile /home/urangasw/data/annotations/Homo_sapiens.GRCh38.102.gtf --sjdbOverhang 74 --limitGenomeGenerateRAM 25000000000


However, the process is getting killed at this particular step :

Jan 09 07:02:25 ..... started STAR run
!!!!! WARNING: Could not move Log.out file from ./Log.out into /scratch/urangasw/data/hg38_index/Log.out. Will keep ./Log.out

Jan 09 07:02:26 ... starting to generate Genome files
Jan 09 07:03:28 ..... processing annotations GTF
Jan 09 07:04:08 ... starting to sort Suffix Array. This may take a long time...
Jan 09 07:04:31 ... sorting Suffix Array chunks and saving them to disk...
Killed


I have tried varying the number of threads as well as limitGenomeGenerateRAM but that didn't help. Available RAM is as follows :

             total        used        free      shared  buff/cache   available
Mem:       65778304    21084068    34436756      357952    10257480    43839268
Swap:      33031164       54140    32977024


RNA-Seq alignment star genome • 1.2k views
0
Entering edit mode

You're running out of disk space, not RAM. See if you have write permissions and sufficient space in the location you're trying to write into.

0
Entering edit mode

Hi _r_am,

I have 5TB of disk space to myself in the scratch section. I do have the write permission of the output folder :

[urangasw@login1 data]\$ ls -al
total 12
drwxr-xr-x 3 urangasw fsg 4096 Jan  8 11:17 .
drwxr-xr-x 3 urangasw fsg 4096 Jan  8 11:16 ..
drwxrwxrwx 2 urangasw fsg 4096 Jan  8 11:20 hg38_index


I do have other text files generated in the targeted output folder such as chrLength.txt etc which I don't think would be possible if I didn't have the write permission. Any alternate thoughts/suggestions please.

1
Entering edit mode

That's odd indeed. Unless ./hg38_index is mounted on a different point, you should not have disk space related problems. Maybe check with your admin - they might have something set up to kill jobs if they exceed certain parameters. This could be the case if you're using login nodes and not specially designated compute nodes on a cluster.

0
Entering edit mode

Will check with the admin and get back to you. Thanks for your time.

0
Entering edit mode

You seem to be limiting your RAM request to 25GB. Can you either remove that or increase the number to say 35000000000 and check.

0
Entering edit mode

Hi Genomax,

I checked for both conditions. The process is still getting killed at the same step.

0
Entering edit mode

Are you the only user on this machine? Can you increase the number to 40GB? I have not done this recently but this page seems to indicate that should work.

0
Entering edit mode

No, it's a shared cluster. However, there are no other processes running in parallel (checked using top command). I tried it using 40GB as well, but unfortunately, it still gets killed at the same point.

2
Entering edit mode
20 months ago
GenoMax 120k

If this is a cluster and you are using a job scheduler then you need to make sure that your scheduler command wrapper takes into consideration this additional request for RAM. Since you posted just the bare STAR command above all of the diagnostic advice you have received so far is just to account for that command.

0
Entering edit mode

Oh alright. I was under the impression that the command would distribute its load onto the available resources of the cluster by itself. So basically, I need to write a job script containing explicit resource allocation apart from the one contained in the command itself while using such shared clusters. Is that correct?

0
Entering edit mode

You should find out which job scheduler your cluster uses (e.g. SLURM, PBS, LSF etc). If this is a shared cluster then almost certainly it will. Every job scheduler has a different syntax for requesting resources which is done outside of program you are trying to run. Ask fellow users/admins.

0
Entering edit mode

Our cluster is using SLURM for scheduling. The following resource request worked for me :

sbatch --ntasks=1 --cpus-per-task=32 --mem=32000mb --partition=regular1 --time=05:00:00 genome_generate.sh


Content of genoem_generate.sh :

#!/bin/sh

STAR --runThreadN 32 --runMode genomeGenerate --genomeDir /scratch/urangasw/data/hg38_index --genomeFastaFiles /home/urangasw/data/genome/human/Homo_sapiens.GRCh38.dna.primary_assembly.fa \
--sjdbGTFfile /home/urangasw/data/annotations/Homo_sapiens.GRCh38.102.gtf --sjdbOverhang 74


Thanks for the help :)