I am trying to generate a genome index of the pepper (capsicum annuum) genome using STAR. Its a really large genome of 3.5 GB and the genome FASTA contains 12 pseudomolecule assemblies and over 30.000 scaffolds. I call STAR with the following command:
$STAR --runMode genomeGenerate –-genomeDir /data_raid1_ssd/databases/genomes/pepper/star --genomeFastaFiles /data_raid1_ssd/databases/genomes/pepper/Annuum.v1.6.Total.fa --runThreadN 8 --genomeChrBinNbits 16 --sjdbGTFfile /data_raid1_ssd/databases/genomes/pepper/Annuum.v.2.0.chromosome.gff3 --sjdbGTFtagExonParentTranscript Parent --sjdbOverhang 99
the processes exits with the following message:
genomeGenerate.cpp:209:genomeGenerate: exiting because of *OUTPUT FILE* error: could not create output file ./GenomeDir//chrName.txt
Solution: check that the path exists and you have write permission for this file
I am using a linux machine with 64 GB RAM which should somehow work (10 x genome size = 35 GB) and using basically the same command line to generate the Arabidopsis genome index worked fine. I guess it is the huge numer of temp files STAR generates because of the huge number of scaffolds in the genome fasta file that causes the problem, but I might be wrong. I already increased my allowed number of open files to 16384 using the
ulimit -n 16384
command but it didn't help. Is there anything I can do to tweak STAR to better deal with this large number of scaffolds or is there any other solution to the problem.
Thanks R
Huh? I'm reading the error as the output folder is not writable, hence permission error. Why are we talking about memory? Maybe you're missing
--outFileNamePrefix?According to this post try to tweakle the threads number too.
You can also try to increase your number of open files over 16384.
I've increased
and now used all 16 available threads. Didn't help.
I used
topto check for %MEM and it says only 10.8%.Your problem is not related to memory. STAR can't reach a file while running.
--outFileNamePrefix)Try to not thread your command line to see if the issue still stand.
Which step of the process failed ? Also could we have the complete log file ?
I am not quite sure how to use the --outFileNamePrefix in context of my command line. But here is a link to Log.out file that was generated.
Log.out
Not sure which step, but happens pretty quick (ca. 2 min after starting the job).
Do you have all rigths on
/data_raid1_ssd/databases/genomes/pepper/star?Do the path
/data_raid1_ssd/databases/genomes/pepper/starexists ?Maybe rename it
/data_raid1_ssd/databases/genomes/pepper/star_index/The option
outFileNamePrefixis used to output the files in an other directory than the current one. Try to set it to a directory owned by yourselfTry to remove some options to see if it affect the result, just keep it simple :
Somehow your
--genomeDirisn't registered. If you look closely in yourLog.out, as suggested by Bastien, you'll notice thegenomeDiris set back to the default value./GenomeDir/.A long while back, I had something similar when using
nohup. While I haven't seen any such permission error in recent years, I continue to use--outFileNamePrefixfor historical reason.Below is an example.
outFileNamePrefixsimply is whatever I have ingenomeDir+/Hope it helps. Good luck.
Btw, I see you're from CSHL. I hope you're running away from mosquitoes this time of year! :)