STAR genome generate memory requirements
1
2
Entering edit mode
4.1 years ago
pattakosn ▴ 20

Hi, I am new around here and in the field and this is my post! Nice to see you :)

I am trying to use star on the reference human genome but I get bad::alloc errors. I tried it on my desktop with 16gb of RAM and I also tried it on our cluster with 128GB ram. Is it possible that it was not enough or am I using the wrong options?

Thanks for any input.

alignment software error genome • 8.4k views
ADD COMMENT
0
Entering edit mode

Could you show your commands and all output messages?

ADD REPLY
0
Entering edit mode

This is the command:

STAR --runThreadN 4      --runMode genomeGenerate      --genomeDir /opt/genetics/ReferenceGenome/human_release-99_dna/genome      --genomeFastaFiles /opt/genetics/ReferenceGenome/human_release-99_dna/Homo_sapiens.GRCh38.dna.toplevel.fa      --sjdbGTFfile /opt/genetics/ReferenceGenome/human_release-99_gtf/Homo_sapiens.GRCh38.99.gtf      --sjdbOverhang 100      --genomeChrBinNbits 12

and the error is

 Mar 31 14:49:03 ..... started STAR run 
 Mar 31 14:49:03 ... starting to generate Genome files
 terminate called after throwing an instance of 'std::bad_alloc'
    what():  std::bad_alloc
 Aborted

And I can see the machine running out of memory using top.

ADD REPLY
1
Entering edit mode

You can add --limitGenomeGenerateRAM parameter and set a value for it (by default it probably uses 31G) and see if that helps.

To be sure did you download a compiled version of STAR for your OS or did you compile it yourself?

ADD REPLY
0
Entering edit mode

I just tried the option you suggested and the output is the same as in my reply to genomax

pattakos@node02 16:10:07 ~/ :
./STAR/bin/Linux_x86_64_static/STAR --runThreadN 20 --runMode genomeGenerate --genomeDir ./genome --genomeFastaFiles ./Homo_sapiens.GRCh38.dna.toplevel.fa --sjdbGTFfile ./Homo_sapiens.GRCh38.99.gtf --sjdbOverhang 100  -limitGenomeGenerateRAM 128000000000 > OUTPUT-128gb.txt 2>&1 &
pattakos@node02 16:10:54 ~/ : 
tail -f OUTPUT-128gb.txt 
Mar 31 16:10:54 ..... started STAR run
Mar 31 16:10:54 ... starting to generate Genome files

EXITING because of FATAL PARAMETER ERROR: limitGenomeGenerateRAM=31000000000is too small for your genome
SOLUTION: please specify --limitGenomeGenerateRAM not less than 168632718037 and make that much RAM available 

Mar 31 16:24:47 ...... FATAL ERROR, exiting

Yes, I used the prebuild binaries for STAR's github repository which I downloaded today but I have also had the same errors a few days ago when I built it myself.

ADD REPLY
0
Entering edit mode

16G is not enough but 128G (if you are able to access at least 40-60G of it) should definitely be.

ADD REPLY
0
Entering edit mode

I would have thought so but this is the output of running the above command on an interactive node:

./STAR/bin/Linux_x86_64_static/STAR --runThreadN 20 --runMode genomeGenerate --genomeDir ./genome --genomeFastaFiles ./Homo_sapiens.GRCh38.dna.toplevel.fa --sjdbGTFfile ./Homo_sapiens.GRCh38.99.gtf --sjdbOverhang 100  --genomeChrBinNbits 15 > OUTPUT-15binNbits.txt 2>&1 &                                                                                                                            
pattakos@node02 15:52:37 ~/ : 
tail -f OUTPUT-15binNbits.txt 
Mar 31 15:52:37 ..... started STAR run
Mar 31 15:52:37 ... starting to generate Genome files

EXITING because of FATAL PARAMETER ERROR: limitGenomeGenerateRAM=31000000000is too small for your genome
SOLUTION: please specify --limitGenomeGenerateRAM not less than 168422653312 and make that much RAM available 

Mar 31 16:05:50 ...... FATAL ERROR, exiting

Please note that I have the same error message whether I use STAR or STARlong, the static binary or not and withought using the genomeChrBinNbits option.

ADD REPLY
0
Entering edit mode

This is odd since I was going to suggest that you try removing genomeChrBinNbits option. Looks like STAR wants 168G of free RAM. Are you using the latest STAR available?

ADD REPLY
0
Entering edit mode

Yes I am using the latest one. Do you think I should contact the star developers?

ADD REPLY
2
Entering edit mode
4.1 years ago
GenoMax 141k

I think the problem is because you are using the toplevel assembly file which includes

These files contains all sequence regions flagged as toplevel in an Ensembl schema. This includes chromsomes, regions not assembled into chromosomes and N padded haplotype/patch regions.

This file is 60G. Are you sure you need this? Normally primary assembly is sufficient for most analyses. This is included in Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz

Edit: I tried the two files in command lines above with 200G RAM. Star still generated an error incorrectly claiming

please specify --limitGenomeGenerateRAM not less than 168632718037 and make that much RAM available

so you may want to post an issue on GitHub and see what Alex has to say. If you want to use top level file.

ADD COMMENT
0
Entering edit mode

I do not know what primary assembly or toplevel are or their differences, I am trying to understand that now and decide which one I need. I tried using the primary assembly file and it required up to 30-35gb ram with 20threads.

I will post an issue then and we will see how it goes. I sincerely thank you for your help.

ADD REPLY
0
Entering edit mode

That is more in line with what I expect.

ADD REPLY
0
Entering edit mode

Thanks for pointing this out @genomax. Using Homo_sapiens.GRCh38.dna_sm.primary_assembly.fa resolved the similar error for me.

ADD REPLY

Login before adding your answer.

Traffic: 1477 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6