Question: Building STAR genome index continually killed
gravatar for Sa
2.4 years ago by
Sa0 wrote:

Hi, I am new to STAR, and I am trying to align sequences. First, I moved the reference genome into my new project directory; this reference genome was one .fa file that was created after the .fa files of each chromosome were concatenated. I then created a genome directory, called genomeDir, with the path to this reference genome file; my path started from the root of the project directory, although I did try to just give it the direct path to the reference genome, which I don't believe makes a difference. After making the genomeDir file, I was getting an "unable to access and write to file" error, which I learned could be solved by creating a STAR file within the genome directory. The command that I used to actually create the index was STAR --runMode genomeGenerate --genomeDir genomeDir --genomeFastaFiles hg19.fa --runThreadN 4. This process kept on getting killed without a clear error message, so I tried two things: calling the make command from the STAR source directory and running the commands from the same folder and adding --genomeSAsparseD 2 to the command. None of these worked. Do you know if this is simply a RAM issue, or why else this might be happening?

Thank you so much for your help!

index rna-seq star • 3.0k views
ADD COMMENTlink modified 2.4 years ago by biocyberman810 • written 2.4 years ago by Sa0

Not quite answer the STAR's problem. But here is my approach for the same need:

Although quite a bulky tool, I would rely on bcbio-nextgen to manage my aligner indexes and reference genome sequences:

python /usr/local/share/bcbio --tooldir=/usr/local \
  --genomes hg19 --genomes hg38 --aligners star --aligners bowtie2

If you do not have much time, you can find premade indexes here

With that said, like @WouterDeCoster mentioned, I find hisat2 quite interesting and have been using it recently. It has premade index and can incorporate information about SNP for alignment. In the output BAM file, if a read covers a specific SNP, the SNP RS number is also used to annotate that read. So, it will be useful in some cases. Hisat2 is a part of "the new tuxedo" which I find interesting as well:

ADD REPLYlink modified 2.4 years ago • written 2.4 years ago by biocyberman810

Please use tags appropriately, as such experts can easily find your question. In this case star would have been very logical, so I have added it to your question.

Note that, if I'm not mistaken, you are talking about creating the index, and not yet about alignment. Therefore I have adapted your title to better reflect what this question is about.

Finally, you suggest it might be a RAM issue, which I agree, but maybe you should then tell us how much RAM your system has.

ADD REPLYlink written 2.4 years ago by WouterDeCoster44k

Sorry, I actually tried adding the STAR tag, but it wasn't showing up automatically. So, I just left it out.

Also, you are right I meant to say indexing. Sorry, about that as well. My ultimate goal is alignment

I have 16GB of RAM. I actually just edited the last command to explicitly say "parameter 2" instead of just 2. And now the computer is just running the process for a really long time. Would you know why?

Thank you for help in advance.

ADD REPLYlink written 2.4 years ago by Sa0
gravatar for WouterDeCoster
2.4 years ago by
WouterDeCoster44k wrote:

As far as I know STAR needs ~30GB of RAM for mapping to the human genome. STAR is fast but eats a lot of memory. You may try HISAT2, or pseudoalignment using e.g Salmon.

ADD COMMENTlink written 2.4 years ago by WouterDeCoster44k

BBmap is also capable of low-memory usage while index generation and mapping.

ADD REPLYlink written 2.4 years ago by michael.ante3.6k

Thank you guys so much for all of your help! I truly appreciate it.

I actually found that another computer I had had 64GB of RAM, so I tried running the process on it. And it worked!

Your guys' suggestion to use HISAT2 is really interesting. I will definitely consider using it in the future.

ADD REPLYlink written 2.4 years ago by Sa0

I have moved my comment to an answer so it can get accepted.

STAR and HISAT2 are both excellent choices, but when memory permits I'd go for STAR. But that's a personal flavour.

ADD REPLYlink written 2.4 years ago by WouterDeCoster44k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 920 users visited in the last hour