Question: Star index generation - 'std::bad_alloc' error
0
gravatar for aranyak111
6 weeks ago by
aranyak1110
aranyak1110 wrote:

I was trying to generate a genome index using STAR index for mutant library 99,50 hours post fertilization (99H50) with the annotation form Lawson lab. The code I used is as follows.

module load STAR; STAR --runThreadN 10 --runMode genomeGenerate
--genomeDir /gpfs/ysm/scratch60/polimanti/ag2646/99H50_new_annotation/z10starindex75/
--genomeFastaFiles /gpfs/ysm/scratch60/polimanti/ag2646/Lawsonreference/genome.fa
--sjdbGTFfile /gpfs/ysm/scratch60/polimanti/ag2646/Lawsonreference/genes.gtf
--sjdbOverhang 75 the batch script used to submit the job for creation of such indices is  dsq --job-file z10starindex75.txt --job-name z10starindex75 -c 10 --mem=100G -t 10:00:00 --mail-type=ALL
--mail-user=xxxxx

I tried to run this code on my HPC cluster and it throws me an error as follows.

Jan 22 22:41:39 ..... started STAR run
Jan 22 22:41:39 ... starting to generate Genome files
Jan 22 22:42:04 ... starting to sort Suffix Array. This may take a long time...
Jan 22 22:42:09 ... sorting Suffix Array chunks and saving them to disk...
Jan 22 22:47:18 ... loading chunks from disk, packing SA...
Jan 22 22:47:42 ... finished generating suffix array
Jan 22 22:47:42 ... generating Suffix Array index
Jan 22 22:49:38 ... completed Suffix Array index
Jan 22 22:49:38 ..... processing annotations GTF
terminate called after throwing an instance of 'std::bad_alloc'


 what():  std::bad_alloc
/bin/sh: line 1: 186783 Aborted                 STAR --runThreadN 10 --runMode genomeGenerate --genomeDir /gpfs/ysm/scratch60/polimanti/ag2646/99H50_new_annotation/z10starindex75/ --genomeFastaFiles /gpfs/ysm/scratch60/polimanti/ag2646/Lawsonreference/genome.fa --sjdbGTFfile /gpfs/ysm/scratch60/polimanti/ag2646/Lawsonreference/genes.gtf --sjdbOverhang 75

I googled and found out that such errors might originate from the allocation of memory and hence I ran from the space in the cluster where I have enough space. The memory usage for such job has been given by

Job ID: 47861791 Array Job ID: 47861791_0
Cluster: farnam User/Group: ag2646/nicoli State: FAILED (exit code 134)
Nodes: 1 Cores per node: 10 CPU Utilized: 00:36:34 CPU Efficiency: 45.14% of 01:21:00 core-walltime Job Wall-clock time: 00:08:06 Memory Utilized: 25.64 GB Memory Efficiency: 25.64% of 100.00 GB.

I browsed the internet and tried to find out solutions. (1) I tried to reduce the number of threads from 10 to 1 to reduce the computational memory issue. (2) I tried to use allocate specific memory limits by using flags like ---limitGenomeGenerateRAM 48000000000 (3) --genomeChrBinNbits 16 Still, the error is creeping in. First few lines of my GTF file is

chr12   UMMS    gene    6160446 6177944 .       -       .       gene_id "LL0000000001"; gene_name "a1cf";
chr12   UMMS    exon    6160446 6161260 .       -       .       gene_id "LL0000000001"; gene_name "a1cf"; transcript_id "ENSDART00000152292";
chr12   UMMS    exon    6163727 6163869 .       -       .       gene_id "LL0000000001"; gene_name "a1cf"; transcript_id "ENSDART00000152292";
chr12   UMMS    exon    6165086 6165222 .       -       .       gene_id "LL0000000001"; gene_name "a1cf"; transcript_id "ENSDART00000152292";
chr12   UMMS    exon    6165305 6165498 .       -       .       gene_id "LL0000000001"; gene_name "a1cf"; transcript_id "ENSDART00000152292";
chr12   UMMS    exon    6167117 6167396 .       -       .       gene_id "LL0000000001"; gene_name "a1cf"; transcript_id "ENSDART00000152292";
chr12   UMMS    exon    6168940 6169037 .       -       .       gene_id "LL0000000001"; gene_name "a1cf"; transcript_id "ENSDART00000152292";
chr12   UMMS    exon    6169982 6170146 .       -       .       gene_id "LL0000000001"; gene_name "a1cf"; transcript_id "ENSDART00000152292";
chr12   UMMS    exon    6170412 6170650 .       -       .       gene_id "LL0000000001"; gene_name "a1cf"; transcript_id "ENSDART00000152292";
chr12   UMMS    exon    6170731 6170861 .       -       .       gene_id "LL0000000001"; gene_name "a1cf"; transcript_id "ENSDART00000152292";

Some of the lines of the genome fasta file is as follows

>chr1
gatcttaaacatttattccccctgcaaacattttcaatcattacattgtc
atttcccctccaaattaaatttagccagaggcgcacaacatacgacctct
aaaaaaggtgctgtaacatgtacctatatgcagcaccactatatgagagc
ggcatagcagtgtttagtcacttggttgctttgtttatattaacttgaaa
gtgtgttttagctattgagtttaaacaaagggagcggtttacattgaatt
aaaggcaactactgatgggttgtgtaatgtttcaaagagctgttgcagca
tgagtggaaaataaaaccgtattagtgctgcctggcccagtttggcacaa
aatggagcgattccattaagagaacgattcagcataagtggaacagcTAA
AGtttatgaaaatttttaatctggatgtagagaatctcataacacagaaa

I have tried to provide as much detail as possible and any help will be helpful.

bioinformatics • 150 views
ADD COMMENTlink modified 5 weeks ago • written 6 weeks ago by aranyak1110
1

Edit: I redacted your email address from the post above.

Are you running this job outside a job scheduler? I don't see clear evidence of this being run via a job scheduler so perhaps cluster head node is killing your job.

ADD REPLYlink modified 6 weeks ago • written 6 weeks ago by GenoMax96k

I am running the job from a job scheduler. If the cluster head node is generating an error how to circumvent it?

ADD REPLYlink written 5 weeks ago by aranyak1110

Based on your original post it is hard to figure out how you are running this job. What scheduler are you using?

ADD REPLYlink modified 5 weeks ago • written 5 weeks ago by GenoMax96k

I am using a slurm based job scheduler. I am submitting a batch job using multiple job queues.dsq --job-file z10starindex75.txt --job-name z10starindex75 -c 10 --mem=100G -t 10:00:00 --mail-type=ALL --mail-user=xxx. If this answers your question.

ADD REPLYlink modified 5 weeks ago by GenoMax96k • written 5 weeks ago by aranyak1110

While I am not familiar with this particular scripted way of submitting SLURM jobs but at least it must be running under the scheduler. So next obvious question is what is the size of /gpfs/ysm/scratch60/polimanti/ag2646/Lawsonreference/genome.fa? STAR may need way more than 100G of RAM if your fasta is > 10 GB in size (i.e. you have a reference with haplotypes etc).

Please use ADD COMMENT/ADD REPLY when responding to existing posts to keep threads logically organized. Also redact your email (I did that above) and any other system specific paths/names that may be sensitive information.

ADD REPLYlink written 5 weeks ago by GenoMax96k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1283 users visited in the last hour
_