Question: STAR genomeLoad issue
1
gravatar for CY
3.1 years ago by
CY510
United States
CY510 wrote:

I ran STAR in a shared memory environment and tried --genomeLoad LoadAndKeep LoadAndRemove and LoadAndExit hoping one-time reference load can be used by all the samples. However, each sample still load its own reference and memory accumulates in cache and eventual killed job due to insufficient RAM. Can anything share some idea on what is going here? Really appreciated!

By the way. what is the difference between LoadAndRemove and LoadAndExit?

rna-seq • 4.0k views
ADD COMMENTlink modified 3.1 years ago by h.mon30k • written 3.1 years ago by CY510
5
gravatar for Devon Ryan
3.1 years ago by
Devon Ryan96k
Freiburg, Germany
Devon Ryan96k wrote:

LoadAndExit is convenient if you want to load the genome and then use it in separate STAR runs. It's generally the method I take, since I prefer to loop over samples and not need to keep track of which one is the first one (i.e., I call LoadAndExit first, then make a for loop over samples, and finally call Remove after the for loop).

ADD COMMENTlink written 3.1 years ago by Devon Ryan96k

Actually I was running two samples almost simultaneously. I thought using LoadAndKeep or LoadAndExit (by the way, what is the difference between these two? I thought both of them are loading the index and keep it in cache) allows the first pipeline load the index and keep it in cache and the second pipeline can use it without loading again. But my test says otherwise...

ADD REPLYlink written 3.1 years ago by CY510

Well, I retried it in your way and it worked. Thanks Ryan!

ADD REPLYlink written 3.1 years ago by CY510

Hi Ryan, does it possible to check whether the genome is loaded in the memory or not, and how?
I wrap the STAR command in a function. if it is possible to check the genome in memory, I can choose which --genomeLoad option to use, and the genome can be removed after all functions finished.

ADD REPLYlink modified 3 months ago • written 3 months ago by wm480

You're advised to just use LoadAndRemove, which will leave a single copy in memory until all concurrent jobs are done.

ADD REPLYlink written 3 months ago by Devon Ryan96k

Thanks so much. it works. after a few test, I found, if the script was interrupted for some reason. the loaded genome would still saved in memory. Even in this situation after several runs, I could clear the memory to fix it (with root).
Anyway, I could run the function in parallel now, thanks again.

ADD REPLYlink written 3 months ago by wm480
0
gravatar for h.mon
3.1 years ago by
h.mon30k
Brazil
h.mon30k wrote:

It seems either you are loading the genome multiple times or a STAR bug. How are you running the multiple STAR runs? Which version of STAR?

LoadAndRemove will automatically remove the index from memory once all STAR jobs using it finishes. LoadAndExit will leave the index in memory until you run STAR with --genomeLoad Remove.

ADD COMMENTlink written 3.1 years ago by h.mon30k

So LoadAndExit is the same as LoadAndKeep? Both of these keeps index in memory until run --genomeLoad Remove.

Also, how does STAR know when all STAR jobs finish? I mean if I write a loop, how can STAR know which is the last one?

ADD REPLYlink written 3.1 years ago by CY510

More or less the same, LoadAndExit does just that, and no mapping whatsoever. LoadAndKeep loads the genome, maps reads and then exits, but leaving the index in memory.

If you use LoadAndExit, STAR doesn't need to know, you will tell STAR when to remove the index after the loop finishes.

ADD REPLYlink written 3.1 years ago by h.mon30k

How do you access the loaded index in the looped call to the star aligner? It seems that STAR is not using my loaded genome correctly. I have tried several configurations of the following with and without the --genomeDir flag.

STAR --genomeLoad LoadAndExit --genomeDir $STARINDEX
for file in $(ls myFastqs/); do
    pushd myFastqs
        rm -r $file-processed
        mkdir $file-processed
        pushd $file-processed
            STAR --runThreadN 5 \
            --readFilesIn ../$file \
            --outFilterMismatchNoverLmax 0.05 \
            --alignIntronMax 20000 \
            --genomeDir $STARINDEX \
            --outSAMstrandField intronMotif \
            --quantMode GeneCounts \
            --sjdbGTFfile $STARGTF
        popd
    popd
done
STAR --genomeLoad Remove --genomeDir $STARINDEX
ADD REPLYlink modified 13 months ago • written 13 months ago by paulranum1160

It seems that STAR is not using my loaded genome correctly.

Why do you think so? Are there error messages? Note that it may be worth opening a new question, if the issue has not been solved by the suggestions in this thread.

ADD REPLYlink written 13 months ago by h.mon30k

I think that i am not properly telling STAR to use the loaded index because when run it as shown (with --genomeDir $STARINDEX) a index file is loaded for every input.fastq in the loop and the system runs out of memory. However when i omit the (--genomeDir $STARINDEX) i get an error saying that the index was not found.

How do i properly input a pre-loaded index into each looped call to STAR?

ADD REPLYlink modified 13 months ago • written 13 months ago by paulranum1160

You still have to mention the "--genomeLoad" method while aligning your reads.

Try this:-

  1. Load the genome index ( for the first time with )

    STAR --genomeLoad LoadAndExit --genomeDir starIndexDirectoryPath

  2. Align your reads (you can use loop at this stage. do as many alignments as possible until you need to remove the index from memory)

    STAR --genomeLoad LoadAndKeep --genomeDir starIndexDirectoryPath --runThreadN nThreads -readFilesIn /pathToReadFile --outFileNamePrefix prefix

  3. To remove the genome index from memory

    STAR --genomeLoad Remove --genomeDir starIndexDirectoryPath

ADD REPLYlink modified 12 months ago • written 12 months ago by xenon10

Hello, if I follow your instructions I get this error:

Jun 22 20:41:50 ...... FATAL ERROR, exiting
./STAR_alignment_paired_2.sh: line 17: --genomeLoad: command not found

The 17th line of my code is inside the for loop which would iterate over the read pairs. So maybe the --genomeLoad command inside the loop is unnecessary?

ADD REPLYlink written 7 weeks ago by Benedek Dankó10
1

There is probably a typo in your script. (you could paste your script here: line16 to line18)

like, your command is in multiple lines, but missing \ at the end of line-16.

ADD REPLYlink written 7 weeks ago by wm480
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1091 users visited in the last hour