Question

Memory Error when running `scrublet`

0

Entering edit mode

3.7 years ago

Assa Yeroslaviz ★ 1.8k

Hi, I'm getting the following error, when trying to run my file

>>> doublet_scores, predicted_doublets = scrub.scrub_doublets(min_counts=2, 
...                                                           min_cells=3, 
...                                                           min_gene_variability_pctl=85, 
...                                                           n_prin_comps=30)
Preprocessing...
/home/scrublet/helper_functions.py:321: RuntimeWarning: divide by zero encountered in true_divide
  w.setdiag(float(target_total) / tots_use)
/home/scrublet/helper_functions.py:252: RuntimeWarning: invalid value encountered in sqrt
  CV_input = np.sqrt(b);
Simulating doublets...
/home/scrublet/helper_functions.py:321: RuntimeWarning: divide by zero encountered in true_divide
  w.setdiag(float(target_total) / tots_use)
Traceback (most recent call last):
  File "<stdin>", line 4, in <module>
  File "/home/scrublet/scrublet.py", line 224, in scrub_doublets
    pipeline_zscore(self)
  File "/home/scrublet/helper_functions.py", line 65, in pipeline_zscore
    self._E_sim_norm = np.array(sparse_zscore(self._E_sim_norm, gene_means, gene_stdevs))
  File "/home/scrublet/helper_functions.py", line 173, in sparse_zscore
    return sparse_multiply((E - gene_mean).T, 1/gene_stdev).T
  File "/home/scrublet/helper_functions.py", line 164, in sparse_multiply
    return w * E
  File "/home/scipy/sparse/base.py", line 518, in __mul__
    result = self._mul_multivector(np.asarray(other))
  File "/home/scipy/sparse/base.py", line 536, in _mul_multivector
    return self.tocsr()._mul_multivector(other)
  File "/home/scipy/sparse/compressed.py", line 485, in _mul_multivector
    dtype=upcast_char(self.dtype.char, other.dtype.char))
MemoryError: Unable to allocate 167. GiB for an array with shape (1651, 13589760) and data type float64
>>>

The Tools was ran within a conda environment (if this makes any difference).

my data set contains Counts matrix shape: 6794880 rows, 31053 columns Number of genes in gene list: 31053

Is there a way to deal with this problem?

thanks

scrublet scRNA-seq doublet • 1.5k views

ADD COMMENT • link 3.7 years ago by Assa Yeroslaviz ★ 1.8k

0

Entering edit mode

Likely not. Looks like the program wants to allocate 167 GiB of memory. Does it work with smaller datasets?

ADD REPLY • link 3.7 years ago by GenoMax 141k

0

Entering edit mode

I would assume so, but it wouldn't help me, if I can't run it on a normal single-cell sparse matrix

I do have enough memory on m server though. This shouldn't be a problem.

ADD REPLY • link 3.7 years ago by Assa Yeroslaviz ★ 1.8k

0

Entering edit mode

Have you tried increasing amount of allocated memory beyond 167G + 10-20%?

scrublet also says:

When working with data from multiple samples, run Scrublet on each sample separately.

ADD REPLY • link 3.7 years ago by GenoMax 141k

0

Entering edit mode

no not yet. As I don't have any memory restrictions and it should be able to use everything on the server. I can't understand why it is restricted to begin with.

This is only one sample

ADD REPLY • link 3.7 years ago by Assa Yeroslaviz ★ 1.8k