Memory Error when running `scrublet`
0
0
Entering edit mode
3.7 years ago
Assa Yeroslaviz ★ 1.8k

Hi, I'm getting the following error, when trying to run my file

>>> doublet_scores, predicted_doublets = scrub.scrub_doublets(min_counts=2, 
...                                                           min_cells=3, 
...                                                           min_gene_variability_pctl=85, 
...                                                           n_prin_comps=30)
Preprocessing...
/home/scrublet/helper_functions.py:321: RuntimeWarning: divide by zero encountered in true_divide
  w.setdiag(float(target_total) / tots_use)
/home/scrublet/helper_functions.py:252: RuntimeWarning: invalid value encountered in sqrt
  CV_input = np.sqrt(b);
Simulating doublets...
/home/scrublet/helper_functions.py:321: RuntimeWarning: divide by zero encountered in true_divide
  w.setdiag(float(target_total) / tots_use)
Traceback (most recent call last):
  File "<stdin>", line 4, in <module>
  File "/home/scrublet/scrublet.py", line 224, in scrub_doublets
    pipeline_zscore(self)
  File "/home/scrublet/helper_functions.py", line 65, in pipeline_zscore
    self._E_sim_norm = np.array(sparse_zscore(self._E_sim_norm, gene_means, gene_stdevs))
  File "/home/scrublet/helper_functions.py", line 173, in sparse_zscore
    return sparse_multiply((E - gene_mean).T, 1/gene_stdev).T
  File "/home/scrublet/helper_functions.py", line 164, in sparse_multiply
    return w * E
  File "/home/scipy/sparse/base.py", line 518, in __mul__
    result = self._mul_multivector(np.asarray(other))
  File "/home/scipy/sparse/base.py", line 536, in _mul_multivector
    return self.tocsr()._mul_multivector(other)
  File "/home/scipy/sparse/compressed.py", line 485, in _mul_multivector
    dtype=upcast_char(self.dtype.char, other.dtype.char))
MemoryError: Unable to allocate 167. GiB for an array with shape (1651, 13589760) and data type float64
>>>

The Tools was ran within a conda environment (if this makes any difference).

my data set contains Counts matrix shape: 6794880 rows, 31053 columns Number of genes in gene list: 31053

Is there a way to deal with this problem?

thanks

scrublet scRNA-seq doublet • 1.5k views
ADD COMMENT
0
Entering edit mode

Likely not. Looks like the program wants to allocate 167 GiB of memory. Does it work with smaller datasets?

ADD REPLY
0
Entering edit mode

I would assume so, but it wouldn't help me, if I can't run it on a normal single-cell sparse matrix

I do have enough memory on m server though. This shouldn't be a problem.

ADD REPLY
0
Entering edit mode

Have you tried increasing amount of allocated memory beyond 167G + 10-20%?

scrublet also says:

When working with data from multiple samples, run Scrublet on each sample separately.

ADD REPLY
0
Entering edit mode

no not yet. As I don't have any memory restrictions and it should be able to use everything on the server. I can't understand why it is restricted to begin with.

This is only one sample

ADD REPLY

Login before adding your answer.

Traffic: 2007 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6