Cannot Generate Genome from RNA Transcript in STAR
1
0
Entering edit mode
4.2 years ago
rekren ▴ 30

Hello,

I want to map some RNA-Seq files to human Transcript to see how much of them will be mapped to the transcriptome. I've downloaded Human Reference Transcript "GCF_000001405.36_GRCh38.p10_rna.fna" from the NCBI and planned to use it as a source to generate genome for STAR mapping.

Problem is, I m taking this error every time ...

STAR --runThreadN 12 --runMode genomeGenerate --genomeDir /mnt/data/tg-hp/human_transcriptome --genomeFastaFiles /mnt/data/tg-hp/human_transcriptome/GCF_000001405.36_GRCh38.p10_rna.fna


May 09 10:43:56 ..... started STAR run May 09 10:43:56 ... starting to generate Genome files

EXITING because of FATAL PARAMETER ERROR: limitGenomeGenerateRAM=31000000000is too small for your genome SOLUTION: please specify limitGenomeGenerateRAM not less than107638450218 and make that much RAM available

May 09 11:16:15 ...... FATAL ERROR, exiting

Size of Human reference transcript file is 600 mb but it requires 107 gb of ram to generate genome from it. Even the human reference genome doesnt require this much ram to generate genome due to its larger size -3.3 gb- on my previous works.

I have only 32 gb of ram and 128 gb of swap on my computer...

I also tried --genomeSAindexNbases parameter and set it to 8 with the hope of overcoming the problem but its again same error.

Can you help me to solve this, please ?

RNA-Seq STAR genomeGenerate • 4.3k views
1
Entering edit mode

STAR isn't really a tool designed for mapping to the transcriptome as far as I know, you are supposed to use it for spliced alignment to the reference genome. If you want to work with the transcriptome I would suggest kallisto or salmon.

0
Entering edit mode

Thanks for suggestion, I can use kallisto for the following studies.

5
Entering edit mode
4.2 years ago

STAR is known to be memory hungry (esp. at Genome Generation step) and as @Wouter says, STAR may not be the right tool for mapping to transcriptome directly. The memory requirements of STAR is also proportional to the number of references, apart from the genome size. That's why even if the genome size is the same, the memory requirement for creating the index is huge in your case of transcriptome, where the number of references are quite large. Another note is that this much memory is required only for creation of index, not for the mapping per se. So you may create the index on some other machine with large memory, and then run the alignment on your local one.

But if you insist on STAR on your box, then do the following

1. Get the latest patched release of STAR. There has been a lot of memory optimization in newer versions.
2. increase --genomeSAsparseD to 2 or even 3
3. The recommended --genomeSAindexNbases is between 10-15. Keep it 12 may be.
4. reduce --genomeChrBinNbits to 16 or even 14, this should reduce memory usage for the case of the large number of references.

All these params are well explained in section 9.5 of STAR manual (Genome Generation Parameters)

You may also like to look at this thread about mapping to transcriptome Aligning Reads To A Reference Transcriptome

1
Entering edit mode

While I was trying to figure out how to use kallisto, I saw your reply. and it worked like a charm, thanks a lot.

I am writing the command line which worked in my case for other people which might face a same problem.

I am using STAR 2.5.3a.

STAR --runThreadN 12 --runMode genomeGenerate --genomeSAsparseD 3 --genomeSAindexNbases 12 -- genomeChrBinNbits 14 --genomeDir /mnt/data/tg-hp/human_transcriptome --genomeFastaFiles /mnt/data/tg-hp/human_transcriptome/GCF_000001405.36_GRCh38.p10_rna.fna

0
Entering edit mode

Your parameters helped me to solve "Segmentation fault (core dumped)" problem on the step "inserting junctions into the genome indices". Thank you Santosh!

Traffic: 1639 users visited in the last hour
FAQ
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.