Question: Error indexing genome with STAR
0
gravatar for zubenel
2.3 years ago by
zubenel0
zubenel0 wrote:

Hi,

I have fastq file with 68 mln. 100 nt SE reads from human sample. I want to align them to human reference genome (hg38). For this purpose I tried to use 2 different ways:

  1. I aligned reads with Bowtie2 aligner in Galaxy platform with default parameters. 44.17% of reads were aligned 0 times, 25.26% of reads were aligned >1 times and 30.57% of reads were aligned exactly one time. As I understand this result, I have many multimappers and many reads that were not aligned at all. Later, I have learned that Bowtie2 aligner is not the best choice for mapping RNA-Seq reads to human genome as it does not properly handle intron-sized gaps.

  2. As a response to inaccurate alignment with Bowtie2 aligner in Galaxy platform, I have decided to use aligner that would align reads across splice-junctions. For this I have chosen STAR aligner as it is free and it is one of the most accurate aligners (http://www.nature.com/nmeth/journal/v14/n2/full/nmeth.4106.html). I have downloaded human genome fasta file from UCSC and gtf file from Gencode (files: hg38.fa and gencode.v25.annotation.gtf). I tried to index human genome using STAR:

    STAR --runThreadN 4 --runMode genomeGenerate --genomeDir ./databases/star_indices_overhang99/ --genomeFastaFiles ./databases/hg38.fa --sjdbGTFfile ./databases/gencode.v25.annotation.gtf --sjdbOverhang99
    

    Mar 08 11:00:52 .... started STAR run Mar 08 11:00:52 ... starting to generate Genome files

After running this program my computer crashed. I have 16 Gb RAM and it seems that it is not enough for STAR. As a result, I need to find another way to realign reads. What would you suggest? Maybe the right option would be to use Amazon EC2 services?

ADD COMMENTlink modified 2.3 years ago by WouterDeCoster39k • written 2.3 years ago by zubenel0
0
gravatar for WouterDeCoster
2.3 years ago by
Belgium
WouterDeCoster39k wrote:

STAR indeed is very memory-hungry, so that would definitely explain it for a large genome such as hg38. Using a bigger server/cloud platform could help.

ADD COMMENTlink written 2.3 years ago by WouterDeCoster39k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1533 users visited in the last hour