Question

Building STAR genome index continually killed

0

Entering edit mode

7.0 years ago

Sa • 0

Hi, I am new to STAR, and I am trying to align sequences. First, I moved the reference genome into my new project directory; this reference genome was one .fa file that was created after the .fa files of each chromosome were concatenated. I then created a genome directory, called genomeDir, with the path to this reference genome file; my path started from the root of the project directory, although I did try to just give it the direct path to the reference genome, which I don't believe makes a difference. After making the genomeDir file, I was getting an "unable to access and write to file" error, which I learned could be solved by creating a STAR file within the genome directory. The command that I used to actually create the index was STAR --runMode genomeGenerate --genomeDir genomeDir --genomeFastaFiles hg19.fa --runThreadN 4. This process kept on getting killed without a clear error message, so I tried two things: calling the make command from the STAR source directory and running the commands from the same folder and adding --genomeSAsparseD 2 to the command. None of these worked. Do you know if this is simply a RAM issue, or why else this might be happening?

Thank you so much for your help!

RNA-Seq star index • 8.6k views

ADD COMMENT • link updated 7.0 years ago by biocyberman ▴ 870 • written 7.0 years ago by Sa • 0

1

Entering edit mode

Not quite answer the STAR's problem. But here is my approach for the same need:

Although quite a bulky tool, I would rely on bcbio-nextgen to manage my aligner indexes and reference genome sequences:

wget https://raw.github.com/bcbio/bcbio-nextgen/master/scripts/bcbio_nextgen_install.py
python bcbio_nextgen_install.py /usr/local/share/bcbio --tooldir=/usr/local \
  --genomes hg19 --genomes hg38 --aligners star --aligners bowtie2

If you do not have much time, you can find premade indexes here http://labshare.cshl.edu/shares/gingeraslab/www-data/dobin/STAR/STARgenomes/

With that said, like @WouterDeCoster mentioned, I find hisat2 quite interesting and have been using it recently. It has premade index https://ccb.jhu.edu/software/hisat2/index.shtml and can incorporate information about SNP for alignment. In the output BAM file, if a read covers a specific SNP, the SNP RS number is also used to annotate that read. So, it will be useful in some cases. Hisat2 is a part of "the new tuxedo" which I find interesting as well: https://www.nature.com/articles/nprot.2016.095

ADD REPLY • link 7.0 years ago by biocyberman ▴ 870

0

Entering edit mode

Please use tags appropriately, as such experts can easily find your question. In this case star would have been very logical, so I have added it to your question.

Note that, if I'm not mistaken, you are talking about creating the index, and not yet about alignment. Therefore I have adapted your title to better reflect what this question is about.

Finally, you suggest it might be a RAM issue, which I agree, but maybe you should then tell us how much RAM your system has.

ADD REPLY • link 7.0 years ago by WouterDeCoster 48k

0

Entering edit mode

Sorry, I actually tried adding the STAR tag, but it wasn't showing up automatically. So, I just left it out.

Also, you are right I meant to say indexing. Sorry, about that as well. My ultimate goal is alignment

I have 16GB of RAM. I actually just edited the last command to explicitly say "parameter 2" instead of just 2. And now the computer is just running the process for a really long time. Would you know why?

Thank you for help in advance.

ADD REPLY • link 7.0 years ago by Sa • 0

score 2 · Accepted Answer · 2018-07-13

2

Entering edit mode

7.0 years ago

WouterDeCoster 48k

As far as I know STAR needs ~30GB of RAM for mapping to the human genome. STAR is fast but eats a lot of memory. You may try HISAT2, or pseudoalignment using e.g Salmon.

ADD COMMENT • link 7.0 years ago by WouterDeCoster 48k

1

Entering edit mode

BBmap is also capable of low-memory usage while index generation and mapping.

ADD REPLY • link 7.0 years ago by michael.ante ★ 4.0k

0

Entering edit mode

Thank you guys so much for all of your help! I truly appreciate it.

I actually found that another computer I had had 64GB of RAM, so I tried running the process on it. And it worked!

Your guys' suggestion to use HISAT2 is really interesting. I will definitely consider using it in the future.

ADD REPLY • link 7.0 years ago by Sa • 0

0

Entering edit mode

I have moved my comment to an answer so it can get accepted.

STAR and HISAT2 are both excellent choices, but when memory permits I'd go for STAR. But that's a personal flavour.

ADD REPLY • link 7.0 years ago by WouterDeCoster 48k