STAR aligner generating genomes takes long time
2
1
Entering edit mode
6.8 years ago
xcalle91 ▴ 20

Hi, I'm trying to use the STAR ultrafast aligner, first I need to generate a genome to align to, so, as it is described in the manual I run this command line:

/pathToStarDir/STAR --runMode genomeGenerate --genomeDir /path/to/GenomeDir --genomeFastaFiles /path/to/genome/*.fa --runThreadN 11

GenomeDir contains one fasta file for each human chromosome.

I started it yesterday morning and after 24 hours it does not finish yet...

I wondering if I did something wrong and is stack in a never ending point. I show few post with similar problems but they did not come out with a solution that works for me.

I doing it in a computer with 8 cores i7 3.60GHz and 31.3 of memory ram

Thanks in advantage

RNA-Seq Assembly • 18k views
ADD COMMENT
3
Entering edit mode

What is your genome size? Star needs a lot of memory to generate and sort the suffix array. 32GB is very little RAM for this purpose and I just guess that the process started swapping. There are some parameters described in the documentation to lower the RAM requirements, e.g. -genomeChrBinNbits 12, see http://seqanswers.com/forums/showthread.php?t=27470

ADD REPLY
2
Entering edit mode
6.8 years ago
Chris Cole ▴ 790

You have two potential problems here.

1) you're setting --runThreadN 11, but your machine only has 8 cores. You may have hyperthreading which allows 16 threads, but I don't find it's all that useful. Best to stick to a maximum --runThreadN 8

2) The human genome requires at least 30GB of free RAM to run. Indexing may require more. You're probably running out of memory, which then spills out into swap which is *VERY* slow. Get more RAM, be patient or use a different aligner which requires less memory.

ADD COMMENT
0
Entering edit mode
4.5 years ago
msimmer92 ▴ 300

I´ve got a question. I am in the same situation, and I don´t know if it´s part of the normal process or I am putting too few threads. My supervisor recommended me to put --runThreadN 2 (two threads). In my case, my computer is a Macbook Pro, 17 inch Mid 2009, OS X El Capitan (v 10.11.6), Processor: 3.06 GHz Intel Core 2 Duo, Memory: 8GB 1067 MHz DDR3, Graphics NVIDIA GeForce 9400M 256 MB. Storage: 500GB (393 GB free). Memory: 2 memory slots of 4GB, each which accepts a 1067 MHz DDR3 memory module. How do you calculate the number of threads that is optimal for this?

ADD COMMENT
1
Entering edit mode

You'll probably need more memory before you need extra threads.

STAR is only fast at the expense of using a lot of memory. If you cannot acquire enough memory, then perhaps you should look at different aligners (e.g. bowtie2).

How big is your genome? If you are working with a human genome, the author of STAR (alexdobin) says that you can get by with 16GB of RAM in sparse mode (at the expense of speed), but you would need more like 30GB using the defaults. (see http://seqanswers.com/forums/showthread.php?t=27470)

ADD REPLY
0
Entering edit mode

It was human genome. I ended up moving to the lab's cluster to use STAR without any problems. Now that you say this, I understand why it didn´t work properly at my personal computer. Thank you for your input and the link!

ADD REPLY

Login before adding your answer.

Traffic: 1868 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6