Question: STAR genome index problem
0
gravatar for Ricky
8 months ago by
Ricky20
Germany
Ricky20 wrote:

I am trying to generate a genome index of the pepper (capsicum annuum) genome using STAR. Its a really large genome of 3.5 GB and the genome FASTA contains 12 pseudomolecule assemblies and over 30.000 scaffolds. I call STAR with the following command:

$STAR --runMode genomeGenerate –-genomeDir /data_raid1_ssd/databases/genomes/pepper/star --genomeFastaFiles /data_raid1_ssd/databases/genomes/pepper/Annuum.v1.6.Total.fa --runThreadN 8 --genomeChrBinNbits 16 --sjdbGTFfile /data_raid1_ssd/databases/genomes/pepper/Annuum.v.2.0.chromosome.gff3 --sjdbGTFtagExonParentTranscript Parent --sjdbOverhang 99

the processes exits with the following message:

genomeGenerate.cpp:209:genomeGenerate: exiting because of *OUTPUT FILE* error: could not create output file ./GenomeDir//chrName.txt
Solution: check that the path exists and you have write permission for this file

I am using a linux machine with 64 GB RAM which should somehow work (10 x genome size = 35 GB) and using basically the same command line to generate the Arabidopsis genome index worked fine. I guess it is the huge numer of temp files STAR generates because of the huge number of scaffolds in the genome fasta file that causes the problem, but I might be wrong. I already increased my allowed number of open files to 16384 using the

ulimit -n 16384

command but it didn't help. Is there anything I can do to tweak STAR to better deal with this large number of scaffolds or is there any other solution to the problem.

Thanks R

sequencing rna-seq genome • 1.1k views
ADD COMMENTlink modified 8 months ago by Bastien Hervé3.6k • written 8 months ago by Ricky20
2

Huh? I'm reading the error as the output folder is not writable, hence permission error. Why are we talking about memory? Maybe you're missing --outFileNamePrefix?

ADD REPLYlink modified 8 months ago • written 8 months ago by Eric Lim1.3k

According to this post try to tweakle the threads number too.

You can also try to increase your number of open files over 16384.

ADD REPLYlink written 8 months ago by Bastien Hervé3.6k

I've increased

ulimit -n 36000

and now used all 16 available threads. Didn't help.

ADD REPLYlink written 8 months ago by Ricky20

I used top to check for %MEM and it says only 10.8%.

ADD REPLYlink written 8 months ago by Ricky20

Your problem is not related to memory. STAR can't reach a file while running.

  • Can't reach file because path issue
  • Can't write file because permission issue
  • Too much open files in the same time that trigger STAR (as said above, you can try to apply a --outFileNamePrefix)

Try to not thread your command line to see if the issue still stand.

Which step of the process failed ? Also could we have the complete log file ?

ADD REPLYlink modified 8 months ago • written 8 months ago by Bastien Hervé3.6k

I am not quite sure how to use the --outFileNamePrefix in context of my command line. But here is a link to Log.out file that was generated.

Log.out

Not sure which step, but happens pretty quick (ca. 2 min after starting the job).

ADD REPLYlink modified 8 months ago • written 8 months ago by Ricky20

Do you have all rigths on /data_raid1_ssd/databases/genomes/pepper/star ?

Do the path /data_raid1_ssd/databases/genomes/pepper/star exists ?

Maybe rename it /data_raid1_ssd/databases/genomes/pepper/star_index/

The option outFileNamePrefix is used to output the files in an other directory than the current one. Try to set it to a directory owned by yourself

Try to remove some options to see if it affect the result, just keep it simple :

$STAR --runMode genomeGenerate –-genomeDir /data_raid1_ssd/databases/genomes/pepper/star --genomeFastaFiles /data_raid1_ssd/databases/genomes/pepper/Annuum.v1.6.Total.fa
ADD REPLYlink modified 8 months ago • written 8 months ago by Bastien Hervé3.6k

Somehow your --genomeDir isn't registered. If you look closely in your Log.out, as suggested by Bastien, you'll notice the genomeDir is set back to the default value ./GenomeDir/.

A long while back, I had something similar when using nohup. While I haven't seen any such permission error in recent years, I continue to use --outFileNamePrefix for historical reason.

Below is an example. outFileNamePrefix simply is whatever I have in genomeDir + /

shell(
  """
  pigz -d -c {snakemake.input.fagz} > {fa}
  pigz -d -c {snakemake.input.gtfgz} > {gtf}
  STAR --runMode genomeGenerate          \
       --runThreadN {snakemake.threads}  \
       --genomeDir {idx}                 \
       --outFileNamePrefix {idx}/        \
       --outTmpDir {idx}/tmp/            \
       --genomeFastaFiles {fa}           \
       --sjdbGTFfile {gtf}               \
       {args}
  rm -rf {fa} {gtf} {idx}/tmp
  """)

Hope it helps. Good luck.

Btw, I see you're from CSHL. I hope you're running away from mosquitoes this time of year! :)

ADD REPLYlink modified 8 months ago • written 8 months ago by Eric Lim1.3k
3
gravatar for Bastien Hervé
8 months ago by
Bastien Hervé3.6k
Limoges, CBRS, France
Bastien Hervé3.6k wrote:

OP, in your command, look closely at :

–-genomeDir /data_raid1_ssd/databases/genomes/pepper/star

The expected -- are in fact –-

Weird, but that could be the answer to your problem

ADD COMMENTlink modified 8 months ago • written 8 months ago by Bastien Hervé3.6k

Its running now. Seems to be a copy/paste artefact. Maybe its not really a good idea to keep record of my commands in a word file and copy them back into the terminal. Thank you all for your input.

ADD REPLYlink modified 8 months ago • written 8 months ago by Ricky20

Turning off smart-quotes and dashes in your keyboard preference helps with these but if you copied from a PDF then this generally happens as well.

ADD REPLYlink modified 8 months ago • written 8 months ago by genomax63k

Try a standard editor like notepad++ !

ADD REPLYlink written 8 months ago by Bastien Hervé3.6k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1297 users visited in the last hour