Hi everyone, I’m trying to better understand the practical differences between generating a STAR genome index manually versus generating it through an NF-core RNA-seq pipeline.
Most STAR tutorials and forum posts emphasize that large genomes (e.g., human) don’t necessarily require extremely high RAM if parameters such as --limitGenomeGenerateRAM, --genomeChrBinNbits, or --genomeSAindexNbases are adjusted correctly.
However, NF-core abstracts these settings, and I’m not fully clear on how much control the user has over STAR parameters during index generation inside the workflow.
So I’d like to ask the community:
Does NF-core override or restrict STAR’s index-generation parameters? For example, can you pass optimized values for
--limitGenomeGenerateRAM,--genomeSAindexNbases, etc., or does NF-core rely mostly on defaults?In your experience, is there any performance/efficiency advantage to building the index within NF-core, compared to building it manually with STAR? Or is it generally better to generate the index directly with STAR (especially when you want full control)?
Are there cases where NF-core’s index building is actually slower or more resource-intensive due to conservative defaults? I’ve seen some reports of unusually long runtimes.
Finally, is there any practical downside to simply providing NF-core with a manually generated STAR index instead of letting the workflow generate its own?
I’m not debating workflows—just trying to understand how much flexibility exists and whether manual index generation is still preferred when dealing with large genomes or limited compute environments.
Thanks!