Question

STAR genome index problem

0

Entering edit mode

7.0 years ago

Ricky ▴ 50

I am trying to generate a genome index of the pepper (capsicum annuum) genome using STAR. Its a really large genome of 3.5 GB and the genome FASTA contains 12 pseudomolecule assemblies and over 30.000 scaffolds. I call STAR with the following command:

$STAR --runMode genomeGenerate –-genomeDir /data_raid1_ssd/databases/genomes/pepper/star --genomeFastaFiles /data_raid1_ssd/databases/genomes/pepper/Annuum.v1.6.Total.fa --runThreadN 8 --genomeChrBinNbits 16 --sjdbGTFfile /data_raid1_ssd/databases/genomes/pepper/Annuum.v.2.0.chromosome.gff3 --sjdbGTFtagExonParentTranscript Parent --sjdbOverhang 99

the processes exits with the following message:

genomeGenerate.cpp:209:genomeGenerate: exiting because of *OUTPUT FILE* error: could not create output file ./GenomeDir//chrName.txt
Solution: check that the path exists and you have write permission for this file

I am using a linux machine with 64 GB RAM which should somehow work (10 x genome size = 35 GB) and using basically the same command line to generate the Arabidopsis genome index worked fine. I guess it is the huge numer of temp files STAR generates because of the huge number of scaffolds in the genome fasta file that causes the problem, but I might be wrong. I already increased my allowed number of open files to 16384 using the

ulimit -n 16384

command but it didn't help. Is there anything I can do to tweak STAR to better deal with this large number of scaffolds or is there any other solution to the problem.

Thanks R

RNA-Seq sequencing genome • 7.2k views

ADD COMMENT • link updated 7.0 years ago by Bastien Hervé 6.4k • written 7.0 years ago by Ricky ▴ 50

2

Entering edit mode

Huh? I'm reading the error as the output folder is not writable, hence permission error. Why are we talking about memory? Maybe you're missing --outFileNamePrefix?

ADD REPLY • link 7.0 years ago by Eric Lim ★ 2.2k

0

Entering edit mode

According to this post try to tweakle the threads number too.

You can also try to increase your number of open files over 16384.

ADD REPLY • link 7.0 years ago by Bastien Hervé 6.4k

0

Entering edit mode

I've increased

ulimit -n 36000

and now used all 16 available threads. Didn't help.

ADD REPLY • link 7.0 years ago by Ricky ▴ 50

0

Entering edit mode

I used top to check for %MEM and it says only 10.8%.

ADD REPLY • link 7.0 years ago by Ricky ▴ 50

0

Entering edit mode

Your problem is not related to memory. STAR can't reach a file while running.

Can't reach file because path issue
Can't write file because permission issue
Too much open files in the same time that trigger STAR (as said above, you can try to apply a --outFileNamePrefix)

Try to not thread your command line to see if the issue still stand.

Which step of the process failed ? Also could we have the complete log file ?

ADD REPLY • link 7.0 years ago by Bastien Hervé 6.4k

0

Entering edit mode

I am not quite sure how to use the --outFileNamePrefix in context of my command line. But here is a link to Log.out file that was generated.

Log.out

Not sure which step, but happens pretty quick (ca. 2 min after starting the job).

ADD REPLY • link 7.0 years ago by Ricky ▴ 50

0

Entering edit mode

Do you have all rigths on /data_raid1_ssd/databases/genomes/pepper/star ?

Do the path /data_raid1_ssd/databases/genomes/pepper/star exists ?

Maybe rename it /data_raid1_ssd/databases/genomes/pepper/star_index/

The option outFileNamePrefix is used to output the files in an other directory than the current one. Try to set it to a directory owned by yourself

Try to remove some options to see if it affect the result, just keep it simple :

$STAR --runMode genomeGenerate –-genomeDir /data_raid1_ssd/databases/genomes/pepper/star --genomeFastaFiles /data_raid1_ssd/databases/genomes/pepper/Annuum.v1.6.Total.fa

ADD REPLY • link 7.0 years ago by Bastien Hervé 6.4k

0

Entering edit mode

Somehow your --genomeDir isn't registered. If you look closely in your Log.out, as suggested by Bastien, you'll notice the genomeDir is set back to the default value ./GenomeDir/.

A long while back, I had something similar when using nohup. While I haven't seen any such permission error in recent years, I continue to use --outFileNamePrefix for historical reason.

Below is an example. outFileNamePrefix simply is whatever I have in genomeDir + /

shell(
  """
  pigz -d -c {snakemake.input.fagz} > {fa}
  pigz -d -c {snakemake.input.gtfgz} > {gtf}
  STAR --runMode genomeGenerate          \
       --runThreadN {snakemake.threads}  \
       --genomeDir {idx}                 \
       --outFileNamePrefix {idx}/        \
       --outTmpDir {idx}/tmp/            \
       --genomeFastaFiles {fa}           \
       --sjdbGTFfile {gtf}               \
       {args}
  rm -rf {fa} {gtf} {idx}/tmp
  """)

Hope it helps. Good luck.

Btw, I see you're from CSHL. I hope you're running away from mosquitoes this time of year! :)

ADD REPLY • link 7.0 years ago by Eric Lim ★ 2.2k

score 4 · Accepted Answer · 2018-07-04

4

Entering edit mode

7.0 years ago

Bastien Hervé 6.4k

OP, in your command, look closely at :

–-genomeDir /data_raid1_ssd/databases/genomes/pepper/star

The expected -- are in fact –-

Weird, but that could be the answer to your problem

ADD COMMENT • link 7.0 years ago by Bastien Hervé 6.4k

0

Entering edit mode

Its running now. Seems to be a copy/paste artefact. Maybe its not really a good idea to keep record of my commands in a word file and copy them back into the terminal. Thank you all for your input.

ADD REPLY • link 7.0 years ago by Ricky ▴ 50

0

Entering edit mode

Turning off smart-quotes and dashes in your keyboard preference helps with these but if you copied from a PDF then this generally happens as well.