Question: Building STAR genome index continually killed
8 months ago by
Sa0 wrote:

Hi, I am new to STAR, and I am trying to align sequences. First, I moved the reference genome into my new project directory; this reference genome was one .fa file that was created after the .fa files of each chromosome were concatenated. I then created a genome directory, called genomeDir, with the path to this reference genome file; my path started from the root of the project directory, although I did try to just give it the direct path to the reference genome, which I don't believe makes a difference. After making the genomeDir file, I was getting an "unable to access and write to file" error, which I learned could be solved by creating a STAR file within the genome directory. The command that I used to actually create the index was STAR --runMode genomeGenerate --genomeDir genomeDir --genomeFastaFiles hg19.fa --runThreadN 4. This process kept on getting killed without a clear error message, so I tried two things: calling the make command from the STAR source directory and running the commands from the same folder and adding --genomeSAsparseD 2 to the command. None of these worked. Do you know if this is simply a RAM issue, or why else this might be happening?

Thank you so much for your help!

index rna-seq star • 824 views
ADD COMMENTlink modified 8 months ago by biocyberman760 • written 8 months ago by Sa0

Not quite answer the STAR's problem. But here is my approach for the same need:

Although quite a bulky tool, I would rely on bcbio-nextgen to manage my aligner indexes and reference genome sequences:

python /usr/local/share/bcbio --tooldir=/usr/local \
  --genomes hg19 --genomes hg38 --aligners star --aligners bowtie2

If you do not have much time, you can find premade indexes here

With that said, like @WouterDeCoster mentioned, I find hisat2 quite interesting and have been using it recently. It has premade index and can incorporate information about SNP for alignment. In the output BAM file, if a read covers a specific SNP, the SNP RS number is also used to annotate that read. So, it will be useful in some cases. Hisat2 is a part of "the new tuxedo" which I find interesting as well:

ADD REPLYlink modified 8 months ago • written 8 months ago by biocyberman760

Please use tags appropriately, as such experts can easily find your question. In this case star would have been very logical, so I have added it to your question.

Note that, if I'm not mistaken, you are talking about creating the index, and not yet about alignment. Therefore I have adapted your title to better reflect what this question is about.

Finally, you suggest it might be a RAM issue, which I agree, but maybe you should then tell us how much RAM your system has.

ADD REPLYlink written 8 months ago by WouterDeCoster37k

Sorry, I actually tried adding the STAR tag, but it wasn't showing up automatically. So, I just left it out.

Also, you are right I meant to say indexing. Sorry, about that as well. My ultimate goal is alignment

I have 16GB of RAM. I actually just edited the last command to explicitly say "parameter 2" instead of just 2. And now the computer is just running the process for a really long time. Would you know why?

Thank you for help in advance.

ADD REPLYlink written 8 months ago by Sa0
8 months ago by
WouterDeCoster37k wrote:

As far as I know STAR needs ~30GB of RAM for mapping to the human genome. STAR is fast but eats a lot of memory. You may try HISAT2, or pseudoalignment using e.g Salmon.

ADD COMMENTlink written 8 months ago by WouterDeCoster37k

BBmap is also capable of low-memory usage while index generation and mapping.

ADD REPLYlink written 8 months ago by michael.ante3.2k

Thank you guys so much for all of your help! I truly appreciate it.

I actually found that another computer I had had 64GB of RAM, so I tried running the process on it. And it worked!

Your guys' suggestion to use HISAT2 is really interesting. I will definitely consider using it in the future.

ADD REPLYlink written 8 months ago by Sa0

I have moved my comment to an answer so it can get accepted.

STAR and HISAT2 are both excellent choices, but when memory permits I'd go for STAR. But that's a personal flavour.

ADD REPLYlink written 8 months ago by WouterDeCoster37k
