I have fastq file with 68 mln. 100 nt SE reads from human sample. I want to align them to human reference genome (hg38). For this purpose I tried to use 2 different ways:
I aligned reads with Bowtie2 aligner in Galaxy platform with default parameters. 44.17% of reads were aligned 0 times, 25.26% of reads were aligned >1 times and 30.57% of reads were aligned exactly one time. As I understand this result, I have many multimappers and many reads that were not aligned at all. Later, I have learned that Bowtie2 aligner is not the best choice for mapping RNA-Seq reads to human genome as it does not properly handle intron-sized gaps.
As a response to inaccurate alignment with Bowtie2 aligner in Galaxy platform, I have decided to use aligner that would align reads across splice-junctions. For this I have chosen STAR aligner as it is free and it is one of the most accurate aligners (http://www.nature.com/nmeth/journal/v14/n2/full/nmeth.4106.html). I have downloaded human genome fasta file from UCSC and gtf file from Gencode (files: hg38.fa and gencode.v25.annotation.gtf). I tried to index human genome using STAR:
STAR --runThreadN 4 --runMode genomeGenerate --genomeDir ./databases/star_indices_overhang99/ --genomeFastaFiles ./databases/hg38.fa --sjdbGTFfile ./databases/gencode.v25.annotation.gtf --sjdbOverhang99
Mar 08 11:00:52 .... started STAR run Mar 08 11:00:52 ... starting to generate Genome files
After running this program my computer crashed. I have 16 Gb RAM and it seems that it is not enough for STAR. As a result, I need to find another way to realign reads. What would you suggest? Maybe the right option would be to use Amazon EC2 services?