Dear All,

How to implement the missing values replacement done in perseus in a python script ?

the perseus documentation says:

*"Missing values will be replaced by random numbers that are drawn from a normal distribution. The parameters of this distribution can be optimized to simulate a typical abundance region that the missing values would have if they had been measured. In the absence of any a priori knowledge, the distribution of random numbers should be similar to the valid values. Often, missing values represent low abundance measurements. The default values are chosen to mimic this case.*

**Parameters**

**Width**

Defines the width of the Gaussian distribution relative to the standard deviation of measured values (default: 0.3). A value of 0.5 would mean that the width of the distribution used for drawing random numbers is half of the standard deviation of the data.

**Down shift**
Specifies the amount by which the distribution used for the random numbers is shifted downwards (default: 1.8). This is in units of the standard deviation of the valid data.
Mode
Specifies whether the replacement of missing values should be applied to each expression column separately (default) or on the whole matrix at once (“Total matrix”).
"*

How is this normal disrubution of missing values is generated with the width and down shift parameters?

Thanks in advance!

This is a weird answer but Python pandas (all similarities with you icon are coincidental) has a full suite to deal with missing values, which include:

You are looking at replace and assign a random number method. Interpolate is a very good method you should look at.

Hi , Thank you. I will check this out.

Additional information for others:

Looks to be from referenced documentation for 'Replace missing values from normal distribution'.

Has classifications:

Type: Matrix Processing

Heading: Imputation

Source code for it is linked here, in 'perseus-plugins/PerseusPluginLib/Impute/ReplaceMissingFromGaussian.cs'

Some coverage of imputation methods as implemented with accompamying Python code:

I note the method described using R in under 'Missing value imputation' on page 23 in this biological paper pre-print by Michaelis et al. looks similar to what is described used by Perseus. It looks like Scipy has scipy.stats.truncnorm that is similar to the function “rtruncnorm” from the R library “truncnorm” that was used in part of the two-tiered approach there.

Hi Wayne, Thanks for the documentation link and the source code. Actually working back with the source code, (replace missing values for the whole matrix) I am having another question. It looks like the random number is drawn from a gaussian distribution with mean (m) and std.dev (s) where

m= mean of all values of the matrix- shift (1.8)* std.dev of all vallues of the matrix

s= std.dev of all values of the matrix * width (0.3)

Now, for example, if i want to impute the missing values of a z-score matrix, Since the mean value of zscore is always close to 0 and standard deviation is always 1 ,the gaussian random number drawn using m and s is always a negative number. is this normal?

i run 10000 random draws and the histrogram of the values drawn is attached.

I'd have to look way more into this to know what is typical. Is it easier to go about checking your results by seeing if it does it match roughly the type of output that Perseus gives with the same input? If so, then you are on right track. Unfortunately, I don't have a place to easily set up Perseus. Unless I missed something it seems it only works on Windows machines?

Dear Wayne, Thanks for your comment. I have verified with perseus data and yes it is closely the same as from the output of python script. Thanks again!!