STAR genomeLoad issue
2
1
Entering edit mode
4.9 years ago
CY ▴ 640

I ran STAR in a shared memory environment and tried --genomeLoad LoadAndKeep LoadAndRemove and LoadAndExit hoping one-time reference load can be used by all the samples. However, each sample still load its own reference and memory accumulates in cache and eventual killed job due to insufficient RAM. Can anything share some idea on what is going here? Really appreciated!

By the way. what is the difference between LoadAndRemove and LoadAndExit?

RNA-Seq • 6.6k views
ADD COMMENT
5
Entering edit mode
4.9 years ago

LoadAndExit is convenient if you want to load the genome and then use it in separate STAR runs. It's generally the method I take, since I prefer to loop over samples and not need to keep track of which one is the first one (i.e., I call LoadAndExit first, then make a for loop over samples, and finally call Remove after the for loop).

ADD COMMENT
0
Entering edit mode

Actually I was running two samples almost simultaneously. I thought using LoadAndKeep or LoadAndExit (by the way, what is the difference between these two? I thought both of them are loading the index and keep it in cache) allows the first pipeline load the index and keep it in cache and the second pipeline can use it without loading again. But my test says otherwise...

ADD REPLY
0
Entering edit mode

Well, I retried it in your way and it worked. Thanks Ryan!

ADD REPLY
0
Entering edit mode

Hi Ryan, does it possible to check whether the genome is loaded in the memory or not, and how?
I wrap the STAR command in a function. if it is possible to check the genome in memory, I can choose which --genomeLoad option to use, and the genome can be removed after all functions finished.

ADD REPLY
0
Entering edit mode

You're advised to just use LoadAndRemove, which will leave a single copy in memory until all concurrent jobs are done.

ADD REPLY
0
Entering edit mode

Thanks so much. it works. after a few test, I found, if the script was interrupted for some reason. the loaded genome would still saved in memory. Even in this situation after several runs, I could clear the memory to fix it (with root).
Anyway, I could run the function in parallel now, thanks again.

ADD REPLY
0
Entering edit mode
4.9 years ago
h.mon 34k

It seems either you are loading the genome multiple times or a STAR bug. How are you running the multiple STAR runs? Which version of STAR?

LoadAndRemove will automatically remove the index from memory once all STAR jobs using it finishes. LoadAndExit will leave the index in memory until you run STAR with --genomeLoad Remove.

ADD COMMENT
0
Entering edit mode

So LoadAndExit is the same as LoadAndKeep? Both of these keeps index in memory until run --genomeLoad Remove.

Also, how does STAR know when all STAR jobs finish? I mean if I write a loop, how can STAR know which is the last one?

ADD REPLY
0
Entering edit mode

More or less the same, LoadAndExit does just that, and no mapping whatsoever. LoadAndKeep loads the genome, maps reads and then exits, but leaving the index in memory.

If you use LoadAndExit, STAR doesn't need to know, you will tell STAR when to remove the index after the loop finishes.

ADD REPLY
0
Entering edit mode

How do you access the loaded index in the looped call to the star aligner? It seems that STAR is not using my loaded genome correctly. I have tried several configurations of the following with and without the --genomeDir flag.

STAR --genomeLoad LoadAndExit --genomeDir $STARINDEX for file in$(ls myFastqs/); do
pushd myFastqs
rm -r $file-processed mkdir$file-processed
pushd $file-processed STAR --runThreadN 5 \ --readFilesIn ../$file \
--outFilterMismatchNoverLmax 0.05 \
--alignIntronMax 20000 \
--genomeDir $STARINDEX \ --outSAMstrandField intronMotif \ --quantMode GeneCounts \ --sjdbGTFfile$STARGTF
popd
popd
done
STAR --genomeLoad Remove --genomeDir $STARINDEX  ADD REPLY 0 Entering edit mode It seems that STAR is not using my loaded genome correctly. Why do you think so? Are there error messages? Note that it may be worth opening a new question, if the issue has not been solved by the suggestions in this thread. ADD REPLY 0 Entering edit mode I think that i am not properly telling STAR to use the loaded index because when run it as shown (with --genomeDir$STARINDEX) a index file is loaded for every input.fastq in the loop and the system runs out of memory. However when i omit the (--genomeDir \$STARINDEX) i get an error saying that the index was not found.

How do i properly input a pre-loaded index into each looped call to STAR?

ADD REPLY
1
Entering edit mode

You still have to mention the "--genomeLoad" method while aligning your reads.

Try this:-

1. Load the genome index ( for the first time with )

STAR --genomeLoad LoadAndExit --genomeDir starIndexDirectoryPath

2. Align your reads (you can use loop at this stage. do as many alignments as possible until you need to remove the index from memory)

STAR --genomeLoad LoadAndKeep --genomeDir starIndexDirectoryPath --runThreadN nThreads -readFilesIn /pathToReadFile --outFileNamePrefix prefix

3. To remove the genome index from memory

STAR --genomeLoad Remove --genomeDir starIndexDirectoryPath

ADD REPLY
0
Entering edit mode

Hello, if I follow your instructions I get this error:

Jun 22 20:41:50 ...... FATAL ERROR, exiting
./STAR_alignment_paired_2.sh: line 17: --genomeLoad: command not found


The 17th line of my code is inside the for loop which would iterate over the read pairs. So maybe the --genomeLoad command inside the loop is unnecessary?

ADD REPLY
1
Entering edit mode

There is probably a typo in your script. (you could paste your script here: line16 to line18)

like, your command is in multiple lines, but missing \ at the end of line-16.

ADD REPLY

Login before adding your answer.

Traffic: 871 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6