Question

Error indexing genome with STAR

0

Entering edit mode

7.1 years ago

zubenel ▴ 120

Hi,

I have fastq file with 68 mln. 100 nt SE reads from human sample. I want to align them to human reference genome (hg38). For this purpose I tried to use 2 different ways:

I aligned reads with Bowtie2 aligner in Galaxy platform with default parameters. 44.17% of reads were aligned 0 times, 25.26% of reads were aligned >1 times and 30.57% of reads were aligned exactly one time. As I understand this result, I have many multimappers and many reads that were not aligned at all. Later, I have learned that Bowtie2 aligner is not the best choice for mapping RNA-Seq reads to human genome as it does not properly handle intron-sized gaps.
As a response to inaccurate alignment with Bowtie2 aligner in Galaxy platform, I have decided to use aligner that would align reads across splice-junctions. For this I have chosen STAR aligner as it is free and it is one of the most accurate aligners (http://www.nature.com/nmeth/journal/v14/n2/full/nmeth.4106.html). I have downloaded human genome fasta file from UCSC and gtf file from Gencode (files: hg38.fa and gencode.v25.annotation.gtf). I tried to index human genome using STAR:
```
STAR --runThreadN 4 --runMode genomeGenerate --genomeDir ./databases/star_indices_overhang99/ --genomeFastaFiles ./databases/hg38.fa --sjdbGTFfile ./databases/gencode.v25.annotation.gtf --sjdbOverhang99
```
Mar 08 11:00:52 .... started STAR run Mar 08 11:00:52 ... starting to generate Genome files

After running this program my computer crashed. I have 16 Gb RAM and it seems that it is not enough for STAR. As a result, I need to find another way to realign reads. What would you suggest? Maybe the right option would be to use Amazon EC2 services?

rna-seq alignment star software error • 2.2k views

ADD COMMENT • link updated 7.1 years ago by WouterDeCoster 47k • written 7.1 years ago by zubenel ▴ 120

score 0 · Answer 1 · 2017-03-08

0

Entering edit mode

7.1 years ago

WouterDeCoster 47k

STAR indeed is very memory-hungry, so that would definitely explain it for a large genome such as hg38. Using a bigger server/cloud platform could help.

ADD COMMENT • link 7.1 years ago by WouterDeCoster 47k