Question: STAR aligner index generating issues
1
Uday Rangaswamy • 150 wrote:
Hi,
I am running the following command to generate genome index using STAR aligner :
Softwares/STAR/bin/Linux_x86_64/STAR --runThreadN 8 --runMode genomeGenerate --genomeDir /scratch/urangasw/data/hg38_index --genomeFastaFiles /home/urangasw/data/genome/human/Homo_sapiens.GRCh38.dna.primary_assembly.fa --sjdbGTFfile /home/urangasw/data/annotations/Homo_sapiens.GRCh38.102.gtf --sjdbOverhang 74 --limitGenomeGenerateRAM 25000000000
However, the process is getting killed at this particular step :
Jan 09 07:02:25 ..... started STAR run
!!!!! WARNING: Could not move Log.out file from ./Log.out into /scratch/urangasw/data/hg38_index/Log.out. Will keep ./Log.out
Jan 09 07:02:26 ... starting to generate Genome files
Jan 09 07:03:28 ..... processing annotations GTF
Jan 09 07:04:08 ... starting to sort Suffix Array. This may take a long time...
Jan 09 07:04:31 ... sorting Suffix Array chunks and saving them to disk...
Killed
I have tried varying the number of threads as well as limitGenomeGenerateRAM but that didn't help. Available RAM is as follows :
total used free shared buff/cache available
Mem: 65778304 21084068 34436756 357952 10257480 43839268
Swap: 33031164 54140 32977024
Please help me understand what this issue is about and how to go about it. Thanks in advance :)
You're running out of disk space, not RAM. See if you have write permissions and sufficient space in the location you're trying to write into.
Hi _r_am,
I have 5TB of disk space to myself in the scratch section. I do have the write permission of the output folder :
I do have other text files generated in the targeted output folder such as chrLength.txt etc which I don't think would be possible if I didn't have the write permission. Any alternate thoughts/suggestions please.
That's odd indeed. Unless
./hg38_index
is mounted on a different point, you should not have disk space related problems. Maybe check with your admin - they might have something set up to kill jobs if they exceed certain parameters. This could be the case if you're using login nodes and not specially designated compute nodes on a cluster.Will check with the admin and get back to you. Thanks for your time.
You seem to be limiting your RAM request to 25GB. Can you either remove that or increase the number to say 35000000000 and check.
Hi Genomax,
I checked for both conditions. The process is still getting killed at the same step.
Are you the only user on this machine? Can you increase the number to 40GB? I have not done this recently but this page seems to indicate that should work.
No, it's a shared cluster. However, there are no other processes running in parallel (checked using top command). I tried it using 40GB as well, but unfortunately, it still gets killed at the same point.
If this is a cluster and you are using a job scheduler then you need to make sure that your scheduler command wrapper takes into consideration this additional request for RAM. Since you posted just the bare STAR command above all of the diagnostic advice you have received so far is just to account for that command.
Oh alright. I was under the impression that the command would distribute its load onto the available resources of the cluster by itself. So basically, I need to write a job script containing explicit resource allocation apart from the one contained in the command itself while using such shared clusters. Is that correct?
You should find out which job scheduler your cluster uses (e.g. SLURM, PBS, LSF etc). If this is a shared cluster then almost certainly it will. Every job scheduler has a different syntax for requesting resources which is done outside of program you are trying to run. Ask fellow users/admins.
Our cluster is using SLURM for scheduling. The following resource request worked for me :
Content of genoem_generate.sh :
Thanks for the help :)