Question: Building STAR genome index continually killed
0
gravatar for Sa
8 months ago by
Sa0
Sa0 wrote:

Hi, I am new to STAR, and I am trying to align sequences. First, I moved the reference genome into my new project directory; this reference genome was one .fa file that was created after the .fa files of each chromosome were concatenated. I then created a genome directory, called genomeDir, with the path to this reference genome file; my path started from the root of the project directory, although I did try to just give it the direct path to the reference genome, which I don't believe makes a difference. After making the genomeDir file, I was getting an "unable to access and write to file" error, which I learned could be solved by creating a STAR file within the genome directory. The command that I used to actually create the index was STAR --runMode genomeGenerate --genomeDir genomeDir --genomeFastaFiles hg19.fa --runThreadN 4. This process kept on getting killed without a clear error message, so I tried two things: calling the make command from the STAR source directory and running the commands from the same folder and adding --genomeSAsparseD 2 to the command. None of these worked. Do you know if this is simply a RAM issue, or why else this might be happening?

Thank you so much for your help!

index rna-seq star • 824 views
ADD COMMENTlink modified 8 months ago by biocyberman760 • written 8 months ago by Sa0
1

Not quite answer the STAR's problem. But here is my approach for the same need:

Although quite a bulky tool, I would rely on bcbio-nextgen to manage my aligner indexes and reference genome sequences:

wget https://raw.github.com/bcbio/bcbio-nextgen/master/scripts/bcbio_nextgen_install.py
python bcbio_nextgen_install.py /usr/local/share/bcbio --tooldir=/usr/local \
  --genomes hg19 --genomes hg38 --aligners star --aligners bowtie2

If you do not have much time, you can find premade indexes here http://labshare.cshl.edu/shares/gingeraslab/www-data/dobin/STAR/STARgenomes/

With that said, like @WouterDeCoster mentioned, I find hisat2 quite interesting and have been using it recently. It has premade index https://ccb.jhu.edu/software/hisat2/index.shtml and can incorporate information about SNP for alignment. In the output BAM file, if a read covers a specific SNP, the SNP RS number is also used to annotate that read. So, it will be useful in some cases. Hisat2 is a part of "the new tuxedo" which I find interesting as well: https://www.nature.com/articles/nprot.2016.095


ADD REPLYlink modified 8 months ago • written 8 months ago by biocyberman760

Please use tags appropriately, as such experts can easily find your question. In this case star would have been very logical, so I have added it to your question.

Note that, if I'm not mistaken, you are talking about creating the index, and not yet about alignment. Therefore I have adapted your title to better reflect what this question is about.

Finally, you suggest it might be a RAM issue, which I agree, but maybe you should then tell us how much RAM your system has.

ADD REPLYlink written 8 months ago by WouterDeCoster37k

Sorry, I actually tried adding the STAR tag, but it wasn't showing up automatically. So, I just left it out.

Also, you are right I meant to say indexing. Sorry, about that as well. My ultimate goal is alignment

I have 16GB of RAM. I actually just edited the last command to explicitly say "parameter 2" instead of just 2. And now the computer is just running the process for a really long time. Would you know why?

Thank you for help in advance.

ADD REPLYlink written 8 months ago by Sa0
2
gravatar for WouterDeCoster
8 months ago by
Belgium
WouterDeCoster37k wrote:

As far as I know STAR needs ~30GB of RAM for mapping to the human genome. STAR is fast but eats a lot of memory. You may try HISAT2, or pseudoalignment using e.g Salmon.

ADD COMMENTlink written 8 months ago by WouterDeCoster37k
1

BBmap is also capable of low-memory usage while index generation and mapping.

ADD REPLYlink written 8 months ago by michael.ante3.2k

Thank you guys so much for all of your help! I truly appreciate it.

I actually found that another computer I had had 64GB of RAM, so I tried running the process on it. And it worked!

Your guys' suggestion to use HISAT2 is really interesting. I will definitely consider using it in the future.

ADD REPLYlink written 8 months ago by Sa0

I have moved my comment to an answer so it can get accepted.

STAR and HISAT2 are both excellent choices, but when memory permits I'd go for STAR. But that's a personal flavour.

ADD REPLYlink written 8 months ago by WouterDeCoster37k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 834 users visited in the last hour