Question: Cannot Generate Genome from RNA Transcript in STAR
0
gravatar for rekren
2.4 years ago by
rekren20
rekren20 wrote:

Hello,

I want to map some RNA-Seq files to human Transcript to see how much of them will be mapped to the transcriptome. I've downloaded Human Reference Transcript "GCF_000001405.36_GRCh38.p10_rna.fna" from the NCBI and planned to use it as a source to generate genome for STAR mapping.

Problem is, I m taking this error every time ...

STAR --runThreadN 12 --runMode genomeGenerate --genomeDir /mnt/data/tg-hp/human_transcriptome --genomeFastaFiles /mnt/data/tg-hp/human_transcriptome/GCF_000001405.36_GRCh38.p10_rna.fna

May 09 10:43:56 ..... started STAR run May 09 10:43:56 ... starting to generate Genome files

EXITING because of FATAL PARAMETER ERROR: limitGenomeGenerateRAM=31000000000is too small for your genome SOLUTION: please specify limitGenomeGenerateRAM not less than107638450218 and make that much RAM available

May 09 11:16:15 ...... FATAL ERROR, exiting

Size of Human reference transcript file is 600 mb but it requires 107 gb of ram to generate genomew from it. Even the human reference genome doesnt require this much ram to generate genome due to its larger size -3.3 gb- on my previous works.

I have only 32 gb of ram and 128 gb of swap on my computer...

I also tried --genomeSAindexNbases parameter and set it to 8 with the hope of overcoming the problem but its again same error.

Can you help me to solve this, please ?

Thanks in advanced ...

rna-seq star genomegenerate • 2.3k views
ADD COMMENTlink modified 2.4 years ago by Santosh Anand5.0k • written 2.4 years ago by rekren20
1

STAR isn't really a tool designed for mapping to the transcriptome as far as I know, you are supposed to use it for spliced alignment to the reference genome. If you want to work with the transcriptome I would suggest kallisto or salmon.

ADD REPLYlink written 2.4 years ago by WouterDeCoster41k

Thanks for suggestion, I can use kallisto for the following studies.

ADD REPLYlink written 2.4 years ago by rekren20
5
gravatar for Santosh Anand
2.4 years ago by
Santosh Anand5.0k
Santosh Anand5.0k wrote:

STAR is known to be memory hungry (esp. at Genome Generation step) and as @Wouter says, STAR may not be the right tool for mapping to transcriptome directly. The memory requirements of STAR is also proportional to the number of references, apart from the genome size. That's why even if the genome size is the same, the memory requirement for creating the index is huge in your case of transcriptome, where the number of references are quite large. Another note is that this much memory is required only for creation of index, not for the mapping per se. So you may create the index on some other machine with large memory, and then run the alignment on your local one.

But if you insist on STAR on your box, then do the following

  1. Get the latest patched release of STAR. There has been a lot of memory optimization in newer versions.
  2. increase --genomeSAsparseD to 2 or even 3
  3. The recommended --genomeSAindexNbases is between 10-15. Keep it 12 may be.
  4. reduce --genomeChrBinNbits to 16 or even 14, this should reduce memory usage for the case of the large number of references.

All these params are well explained in section 9.5 of STAR manual (Genome Generation Parameters)

You may also like to look at this thread about mapping to transcriptome Aligning Reads To A Reference Transcriptome

ADD COMMENTlink written 2.4 years ago by Santosh Anand5.0k
1

While I was trying to figure out how to use kallisto, I saw your reply. and it worked like a charm, thanks a lot.

I am writing the command line which worked in my case for other people which might face a same problem.

I am using STAR 2.5.3a.

STAR --runThreadN 12 --runMode genomeGenerate --genomeSAsparseD 3 --genomeSAindexNbases 12 -- genomeChrBinNbits 14 --genomeDir /mnt/data/tg-hp/human_transcriptome --genomeFastaFiles /mnt/data/tg-hp/human_transcriptome/GCF_000001405.36_GRCh38.p10_rna.fna
ADD REPLYlink written 2.4 years ago by rekren20

Your parameters helped me to solve "Segmentation fault (core dumped)" problem on the step "inserting junctions into the genome indices". Thank you Santosh!

ADD REPLYlink written 29 days ago by dmitriy0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 863 users visited in the last hour