Question

STAR genomeLoad issue

1

Entering edit mode

6.8 years ago

CY ▴ 750

I ran STAR in a shared memory environment and tried --genomeLoad LoadAndKeep LoadAndRemove and LoadAndExit hoping one-time reference load can be used by all the samples. However, each sample still load its own reference and memory accumulates in cache and eventual killed job due to insufficient RAM. Can anything share some idea on what is going here? Really appreciated!

By the way. what is the difference between LoadAndRemove and LoadAndExit?

RNA-Seq • 9.3k views

ADD COMMENT • link updated 6.8 years ago by h.mon 35k • written 6.8 years ago by CY ▴ 750

score 5 · Answer 1 · 2017-06-28

5

Entering edit mode

6.8 years ago

Devon Ryan 104k

LoadAndExit is convenient if you want to load the genome and then use it in separate STAR runs. It's generally the method I take, since I prefer to loop over samples and not need to keep track of which one is the first one (i.e., I call LoadAndExit first, then make a for loop over samples, and finally call Remove after the for loop).

ADD COMMENT • link 6.8 years ago by Devon Ryan 104k

0

Entering edit mode

Actually I was running two samples almost simultaneously. I thought using LoadAndKeep or LoadAndExit (by the way, what is the difference between these two? I thought both of them are loading the index and keep it in cache) allows the first pipeline load the index and keep it in cache and the second pipeline can use it without loading again. But my test says otherwise...

ADD REPLY • link 6.8 years ago by CY ▴ 750

0

Entering edit mode

Well, I retried it in your way and it worked. Thanks Ryan!

ADD REPLY • link 6.8 years ago by CY ▴ 750

0

Entering edit mode

Hi Ryan, does it possible to check whether the genome is loaded in the memory or not, and how?
I wrap the STAR command in a function. if it is possible to check the genome in memory, I can choose which --genomeLoad option to use, and the genome can be removed after all functions finished.

ADD REPLY • link 4.0 years ago by wm ▴ 560

0

Entering edit mode

You're advised to just use LoadAndRemove, which will leave a single copy in memory until all concurrent jobs are done.

ADD REPLY • link 4.0 years ago by Devon Ryan 104k

0

Entering edit mode

Thanks so much. it works. after a few test, I found, if the script was interrupted for some reason. the loaded genome would still saved in memory. Even in this situation after several runs, I could clear the memory to fix it (with root).
Anyway, I could run the function in parallel now, thanks again.

ADD REPLY • link 4.0 years ago by wm ▴ 560

score 0 · Answer 2 · 2017-06-28

0

Entering edit mode

6.8 years ago

h.mon 35k

It seems either you are loading the genome multiple times or a STAR bug. How are you running the multiple STAR runs? Which version of STAR?

LoadAndRemove will automatically remove the index from memory once all STAR jobs using it finishes. LoadAndExit will leave the index in memory until you run STAR with --genomeLoad Remove.

ADD COMMENT • link 6.8 years ago by h.mon 35k

0

Entering edit mode

So LoadAndExit is the same as LoadAndKeep? Both of these keeps index in memory until run --genomeLoad Remove.

Also, how does STAR know when all STAR jobs finish? I mean if I write a loop, how can STAR know which is the last one?

ADD REPLY • link 6.8 years ago by CY ▴ 750

0

Entering edit mode

More or less the same, LoadAndExit does just that, and no mapping whatsoever. LoadAndKeep loads the genome, maps reads and then exits, but leaving the index in memory.

If you use LoadAndExit, STAR doesn't need to know, you will tell STAR when to remove the index after the loop finishes.

ADD REPLY • link 6.8 years ago by h.mon 35k

0

Entering edit mode

How do you access the loaded index in the looped call to the star aligner? It seems that STAR is not using my loaded genome correctly. I have tried several configurations of the following with and without the --genomeDir flag.

STAR --genomeLoad LoadAndExit --genomeDir $STARINDEX
for file in $(ls myFastqs/); do
    pushd myFastqs
        rm -r $file-processed
        mkdir $file-processed
        pushd $file-processed
            STAR --runThreadN 5 \
            --readFilesIn ../$file \
            --outFilterMismatchNoverLmax 0.05 \
            --alignIntronMax 20000 \
            --genomeDir $STARINDEX \
            --outSAMstrandField intronMotif \
            --quantMode GeneCounts \
            --sjdbGTFfile $STARGTF
        popd
    popd
done
STAR --genomeLoad Remove --genomeDir $STARINDEX

ADD REPLY • link 4.8 years ago by paulranum11 ▴ 80

0

Entering edit mode

It seems that STAR is not using my loaded genome correctly.

Why do you think so? Are there error messages? Note that it may be worth opening a new question, if the issue has not been solved by the suggestions in this thread.

ADD REPLY • link 4.8 years ago by h.mon 35k

0

Entering edit mode

I think that i am not properly telling STAR to use the loaded index because when run it as shown (with --genomeDir $STARINDEX) a index file is loaded for every input.fastq in the loop and the system runs out of memory. However when i omit the (--genomeDir $STARINDEX) i get an error saying that the index was not found.

How do i properly input a pre-loaded index into each looped call to STAR?

ADD REPLY • link 4.8 years ago by paulranum11 ▴ 80

1

Entering edit mode

You still have to mention the "--genomeLoad" method while aligning your reads.

Try this:-

Load the genome index ( for the first time with )
STAR --genomeLoad LoadAndExit --genomeDir starIndexDirectoryPath
Align your reads (you can use loop at this stage. do as many alignments as possible until you need to remove the index from memory)
STAR --genomeLoad LoadAndKeep --genomeDir starIndexDirectoryPath --runThreadN nThreads -readFilesIn /pathToReadFile --outFileNamePrefix prefix
To remove the genome index from memory
STAR --genomeLoad Remove --genomeDir starIndexDirectoryPath

ADD REPLY • link 4.8 years ago by xenon ▴ 20

0

Entering edit mode

Hello, if I follow your instructions I get this error:

Jun 22 20:41:50 ...... FATAL ERROR, exiting
./STAR_alignment_paired_2.sh: line 17: --genomeLoad: command not found

The 17th line of my code is inside the for loop which would iterate over the read pairs. So maybe the --genomeLoad command inside the loop is unnecessary?

ADD REPLY • link 3.8 years ago by Benedek Dankó ▴ 50

1

Entering edit mode

There is probably a typo in your script. (you could paste your script here: line16 to line18)

like, your command is in multiple lines, but missing \ at the end of line-16.

ADD REPLY • link 3.8 years ago by wm ▴ 560