Question: Memory Error when running `scrublet`
0
gravatar for Assa Yeroslaviz
6 weeks ago by
Assa Yeroslaviz1.4k
Munich
Assa Yeroslaviz1.4k wrote:

Hi, I'm getting the following error, when trying to run my file

>>> doublet_scores, predicted_doublets = scrub.scrub_doublets(min_counts=2, 
...                                                           min_cells=3, 
...                                                           min_gene_variability_pctl=85, 
...                                                           n_prin_comps=30)
Preprocessing...
/home/scrublet/helper_functions.py:321: RuntimeWarning: divide by zero encountered in true_divide
  w.setdiag(float(target_total) / tots_use)
/home/scrublet/helper_functions.py:252: RuntimeWarning: invalid value encountered in sqrt
  CV_input = np.sqrt(b);
Simulating doublets...
/home/scrublet/helper_functions.py:321: RuntimeWarning: divide by zero encountered in true_divide
  w.setdiag(float(target_total) / tots_use)
Traceback (most recent call last):
  File "<stdin>", line 4, in <module>
  File "/home/scrublet/scrublet.py", line 224, in scrub_doublets
    pipeline_zscore(self)
  File "/home/scrublet/helper_functions.py", line 65, in pipeline_zscore
    self._E_sim_norm = np.array(sparse_zscore(self._E_sim_norm, gene_means, gene_stdevs))
  File "/home/scrublet/helper_functions.py", line 173, in sparse_zscore
    return sparse_multiply((E - gene_mean).T, 1/gene_stdev).T
  File "/home/scrublet/helper_functions.py", line 164, in sparse_multiply
    return w * E
  File "/home/scipy/sparse/base.py", line 518, in __mul__
    result = self._mul_multivector(np.asarray(other))
  File "/home/scipy/sparse/base.py", line 536, in _mul_multivector
    return self.tocsr()._mul_multivector(other)
  File "/home/scipy/sparse/compressed.py", line 485, in _mul_multivector
    dtype=upcast_char(self.dtype.char, other.dtype.char))
MemoryError: Unable to allocate 167. GiB for an array with shape (1651, 13589760) and data type float64
>>>

The Tools was ran within a conda environment (if this makes any difference).

my data set contains Counts matrix shape: 6794880 rows, 31053 columns Number of genes in gene list: 31053

Is there a way to deal with this problem?

thanks

doublet scrublet scrna-seq • 135 views
ADD COMMENTlink written 6 weeks ago by Assa Yeroslaviz1.4k

Likely not. Looks like the program wants to allocate 167 GiB of memory. Does it work with smaller datasets?

ADD REPLYlink written 6 weeks ago by genomax89k

I would assume so, but it wouldn't help me, if I can't run it on a normal single-cell sparse matrix

I do have enough memory on m server though. This shouldn't be a problem.

ADD REPLYlink written 6 weeks ago by Assa Yeroslaviz1.4k

Have you tried increasing amount of allocated memory beyond 167G + 10-20%?

scrublet also says:

When working with data from multiple samples, run Scrublet on each sample separately.

ADD REPLYlink modified 6 weeks ago • written 6 weeks ago by genomax89k

no not yet. As I don't have any memory restrictions and it should be able to use everything on the server. I can't understand why it is restricted to begin with.

This is only one sample

ADD REPLYlink modified 6 weeks ago • written 6 weeks ago by Assa Yeroslaviz1.4k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1187 users visited in the last hour