CD-HIT uses up all RAM and then crashes
0
1
Entering edit mode
2.4 years ago
VDL ▴ 10

I'm trying to use cd-hit to generate a 0.9 sequence identity cutoff of the Blast NR database.

Here's what I'm running:

cd-hit -i nr -o nr90 -c 0.9 -M 1000


But, even though I'm using the -M 1000 option, the command just gradually uses up all the available RAM (8gb) and then crashes. Any idea on how to fix this?

cd-hit • 635 views
2
Entering edit mode

More RAM.

if you want to cluster all of nr, you’re going to need much more than 8gb.

This strikes me as an XY problem though, what are you trying to achieve?

0
Entering edit mode

I'm trying to replicate a result for a protein prediction problem that used this database. My understanding was that the -M flag was supposed to limit the amount of RAM that the program used. So it doesn't work?

0
Entering edit mode

The program needs to use at least a certain amount of RAM. You can’t make a program that needs X gb of RAM run on < X.

It’s probably hitting your limit of 1gb, and then crashing, RAM usage can be a somewhat complex thing to monitor. I’m not sure why it continues to use all 8gb when you specifiy 1, but regardless, NR is far, far too big to be done with even 8, I’d pretty much guarantee.

1
Entering edit mode

Maybe you want to use UniRef?

0
Entering edit mode

I'll probably use it if I can't get the other option to work, thanks for pointing that out.