Question

STAR alignment getting KILLED in process. Please help!

1

Entering edit mode

22 months ago

Soumajit ▴ 40

Hello good peeps,

Recently I started analysis of RNA-seq data mostly by a self-learning method (articles, online tutorials), and unfortunately, I do not have access to a high-performance computing cluster. I am working on a machine with an Intel I9 12th Gen processor, 32 GB DDR5 RAM.

I am using VM Ubuntu and Terminal for the analysis. Although I could do the whole analysis on Galaxy but thought it would be better to learn how to do it with scripts. So, past few days I have been stuck in the alignment step. I am trying to create an index of hg38. Here is my command.

STAR --runThreadN 8 --runMode genomeGenerate --genomeDir /home/oliver/calc/rawfiles/annot --genomeFastaFiles /home/oliver/calc/rawfiles/hg38.fa --sjdbGTFfile /home/oliver/calc/rawfiles/hg38.refGene.gtf --sjdbOverhang 100

The process was killed at the following: Jun 17 13:53:29 ... loading chunks from disk, packing SA... terminate called after throwing an instance of 'std::bad_alloc' what(): std::bad_alloc /usr/bin/STAR: line 7: 68768 Aborted (core dumped) "${cmd}" "$@"

And in the last attempt, the command was this:

STAR --runThreadN 6 --runMode genomeGenerate --genomeDir /home/oliver/calc/rawfiles/annot --genomeFastaFiles /home/oliver/calc/rawfiles/hg38.fa --sjdbGTFfile /home/oliver/calc/rawfiles/hg38.refGene.gtf --sjdbOverhang 100 --limitGenomeGenerateRAM 18000000000

In this one, I could reach till SA_47 file in the mentioned directory (genomeDir = annot), but then the process crashed again, and I could not see the previous SA files anymore.

Could anyone please help with this issue and suggest how to solve it? Is there anything wrong with the command or is it just an issue of less RAM? In the VM, the memory looks like this.

oliver@oliver-VirtualBox:~$ free total used free shared buff/cache available Mem: 24414620 996720 22690664 30064 727236 23030164 Swap: 2097148 57180 2039968

Sorry for the long post. Hoping for helpful responses. Thanks.

RNA-seq alignment STAR • 3.7k views

ADD COMMENT • link 22 months ago by Soumajit ▴ 40

1

Entering edit mode

I think it is just the memory, and running this in a VM further does not help because you need to allocate some RAM to the host system as well. Sorting the suffix array in STAR is a very memory intensive step and may be futile with your amount of RAM. You could likely run an alignment if you can get a pre-built genome index from somewhere.

Here is a link https://labshare.cshl.edu/shares/gingeraslab/www-data/dobin/STAR/STARgenomes/Human/ but I am not sure if the indices are compatible with the latest version.

ADD REPLY • link 22 months ago by Michael 54k

0

Entering edit mode

Hello Michael, thanks for taking the time to reply and your suggestions. I have a few questions,

In your experience, do you think running any other aligner, such as HISAT2 would be better for my system and could work? I read somewhere that it requires less RAM than STAR.
Thanks for the link. Do you think I can download each file one by one and put them into one genomeDir folder and then running the alignment could work?

Again, thanks a lot.

ADD REPLY • link 22 months ago by Soumajit ▴ 40

0

Entering edit mode

It might be worth trying to build your index with HISAT2, but it also needs a lot of memory for generating the index. There was a recent post on that, and there are some options to reduce memory requirements. If it works, most likely only without using splice site and exon annotation. Another option is to use Salmon or Kallisto, these tools are more suited to consumer hardware.

For downloading, try to download the full https://labshare.cshl.edu/shares/gingeraslab/www-data/dobin/STAR/STARgenomes/Human/GRCh38_Ensembl99_sparseD3_sjdbOverhang99/ directory with wget -r

ADD REPLY • link 22 months ago by Michael 54k

0

Entering edit mode

I tried to run the STAR alignment with the pre-built index from the link that you shared. The following was the command.

STAR --runThreadN 4 --genomeDir Index --readFilesIn /mnt/d/RNA_seq/Files/'Control_R1.gz' /mnt/d/RNA_seq/Files/'Control_R2.gz' --readFilesCommand zcat --outFileNamePrefix alingments/trial_1 --outSAMtype BAM Unsorted

But it failed saying, 'EXITING because of FATAL error, could not open file Index/chrName.txt SOLUTION: re-generate genome files with STAR --runMode genomeGenerate'

So, probably as you mentioned, it is not compatible anymore?

But, thanks anyway.

ADD REPLY • link 22 months ago by Soumajit ▴ 40

0

Entering edit mode

If all you need is gene level quantification you can use Salmon instead, which uses considerably less memory and provides more accurate quantifications.

ADD REPLY • link 22 months ago by rpolicastro 13k

1

Entering edit mode

Pair that up by working on either a native Linux distro or via WSL2. These VMs are just another unnecessary layer of complexity.

ADD REPLY • link 22 months ago by ATpoint 81k

1

Entering edit mode

ATpoint and rpolicastro thanks for your inputs. I get your point and next am trying to run this on Salmon on WSL2 (skipping VM for now). Thanks

ADD REPLY • link 22 months ago by Soumajit ▴ 40

score 0 · Answer 1 · 2022-06-19

VMware cannot assign the full RAM in virtual machine. A maximum RAM in virtual machine should be set as ~80% of the full RAM, meaning that a 32 GB RAM in host should have ~25GB in virtual machine. Otherwise it will cause memory swapping.

Two 16GB RAM won't cost you much. If you really don't want to extend your physical RAM. Another option is to increase /swap partition so that files can be read into it. But I strongly recommend you NOT using this method because /swap takes your physical storage like SSD or HDD as RAM, which is muuuuuuuuuch slower. A simple alignment job could be finished within 30min. Using swap may spend you several hours or even days.

As for your index issue, the argument --genomeDir requires the absolute directory to the index folder that holds your STAR index. The error message tells you it cannot find a directory named 'Index'. If you store your index in the following path: /home/Soumjit/genome_ref/STAR_index/ , you should enter the following command: --genomeDir /home/Soumjit/genome_ref/STAR_index/

score 0 · Answer 2 · 2022-06-21

0

Entering edit mode

22 months ago

Soumajit ▴ 40

Hello everyone who gave valuable inputs, finally I used a computing cluster with much higher memory allocation and the script finally could finish creating the index. Thanks again for the suggestions.

ADD COMMENT • link 22 months ago by Soumajit ▴ 40