I am trying to generate a genome index of the pepper (capsicum annuum) genome using STAR. Its a really large genome of 3.5 GB and the genome FASTA contains 12 pseudomolecule assemblies and over 30.000 scaffolds. I call STAR with the following command:
$STAR --runMode genomeGenerate –-genomeDir /data_raid1_ssd/databases/genomes/pepper/star --genomeFastaFiles /data_raid1_ssd/databases/genomes/pepper/Annuum.v1.6.Total.fa --runThreadN 8 --genomeChrBinNbits 16 --sjdbGTFfile /data_raid1_ssd/databases/genomes/pepper/Annuum.v.2.0.chromosome.gff3 --sjdbGTFtagExonParentTranscript Parent --sjdbOverhang 99
the processes exits with the following message:
genomeGenerate.cpp:209:genomeGenerate: exiting because of *OUTPUT FILE* error: could not create output file ./GenomeDir//chrName.txt Solution: check that the path exists and you have write permission for this file
I am using a linux machine with 64 GB RAM which should somehow work (10 x genome size = 35 GB) and using basically the same command line to generate the Arabidopsis genome index worked fine. I guess it is the huge numer of temp files STAR generates because of the huge number of scaffolds in the genome fasta file that causes the problem, but I might be wrong. I already increased my allowed number of open files to 16384 using the
ulimit -n 16384
command but it didn't help. Is there anything I can do to tweak STAR to better deal with this large number of scaffolds or is there any other solution to the problem.