Question: STAR genomeLoad issue
1
gravatar for CY
2.3 years ago by
CY410
United States
CY410 wrote:

I ran STAR in a shared memory environment and tried --genomeLoad LoadAndKeep LoadAndRemove and LoadAndExit hoping one-time reference load can be used by all the samples. However, each sample still load its own reference and memory accumulates in cache and eventual killed job due to insufficient RAM. Can anything share some idea on what is going here? Really appreciated!

By the way. what is the difference between LoadAndRemove and LoadAndExit?

rna-seq • 2.6k views
ADD COMMENTlink modified 2.3 years ago by h.mon27k • written 2.3 years ago by CY410
3
gravatar for Devon Ryan
2.3 years ago by
Devon Ryan92k
Freiburg, Germany
Devon Ryan92k wrote:

LoadAndExit is convenient if you want to load the genome and then use it in separate STAR runs. It's generally the method I take, since I prefer to loop over samples and not need to keep track of which one is the first one (i.e., I call LoadAndExit first, then make a for loop over samples, and finally call Remove after the for loop).

ADD COMMENTlink written 2.3 years ago by Devon Ryan92k

Actually I was running two samples almost simultaneously. I thought using LoadAndKeep or LoadAndExit (by the way, what is the difference between these two? I thought both of them are loading the index and keep it in cache) allows the first pipeline load the index and keep it in cache and the second pipeline can use it without loading again. But my test says otherwise...

ADD REPLYlink written 2.3 years ago by CY410

Well, I retried it in your way and it worked. Thanks Ryan!

ADD REPLYlink written 2.3 years ago by CY410
0
gravatar for h.mon
2.3 years ago by
h.mon27k
Brazil
h.mon27k wrote:

It seems either you are loading the genome multiple times or a STAR bug. How are you running the multiple STAR runs? Which version of STAR?

LoadAndRemove will automatically remove the index from memory once all STAR jobs using it finishes. LoadAndExit will leave the index in memory until you run STAR with --genomeLoad Remove.

ADD COMMENTlink written 2.3 years ago by h.mon27k

So LoadAndExit is the same as LoadAndKeep? Both of these keeps index in memory until run --genomeLoad Remove.

Also, how does STAR know when all STAR jobs finish? I mean if I write a loop, how can STAR know which is the last one?

ADD REPLYlink written 2.3 years ago by CY410

More or less the same, LoadAndExit does just that, and no mapping whatsoever. LoadAndKeep loads the genome, maps reads and then exits, but leaving the index in memory.

If you use LoadAndExit, STAR doesn't need to know, you will tell STAR when to remove the index after the loop finishes.

ADD REPLYlink written 2.3 years ago by h.mon27k

How do you access the loaded index in the looped call to the star aligner? It seems that STAR is not using my loaded genome correctly. I have tried several configurations of the following with and without the --genomeDir flag.

STAR --genomeLoad LoadAndExit --genomeDir $STARINDEX
for file in $(ls myFastqs/); do
    pushd myFastqs
        rm -r $file-processed
        mkdir $file-processed
        pushd $file-processed
            STAR --runThreadN 5 \
            --readFilesIn ../$file \
            --outFilterMismatchNoverLmax 0.05 \
            --alignIntronMax 20000 \
            --genomeDir $STARINDEX \
            --outSAMstrandField intronMotif \
            --quantMode GeneCounts \
            --sjdbGTFfile $STARGTF
        popd
    popd
done
STAR --genomeLoad Remove --genomeDir $STARINDEX
ADD REPLYlink modified 3 months ago • written 3 months ago by paulranum1140

It seems that STAR is not using my loaded genome correctly.

Why do you think so? Are there error messages? Note that it may be worth opening a new question, if the issue has not been solved by the suggestions in this thread.

ADD REPLYlink written 3 months ago by h.mon27k

I think that i am not properly telling STAR to use the loaded index because when run it as shown (with --genomeDir $STARINDEX) a index file is loaded for every input.fastq in the loop and the system runs out of memory. However when i omit the (--genomeDir $STARINDEX) i get an error saying that the index was not found.

How do i properly input a pre-loaded index into each looped call to STAR?

ADD REPLYlink modified 3 months ago • written 3 months ago by paulranum1140

You still have to mention the "--genomeLoad" method while aligning your reads.

Try this:-

  1. Load the genome index ( for the first time with )

    STAR --genomeLoad LoadAndExit --genomeDir starIndexDirectoryPath

  2. Align your reads (you can use loop at this stage. do as many alignments as possible until you need to remove the index from memory)

    STAR --genomeLoad LoadAndKeep --genomeDir starIndexDirectoryPath --runThreadN nThreads -readFilesIn /pathToReadFile --outFileNamePrefix prefix

  3. To remove the genome index from memory

    STAR --genomeLoad Remove --genomeDir starIndexDirectoryPath

ADD REPLYlink modified 11 weeks ago • written 11 weeks ago by xenon10
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2127 users visited in the last hour