**Let's start with the calculation of the connectivity scores (***S*_{i}):

For an instance *i* (i.e. a perturbagen in specific conditions - cell/dose/time), the final score *S*_{i} depends on "preliminary" scores *s*_{i} of all other instances:

For postively connected perturbagenes (instances with positive values, *s*_{i} > 0) it is divided by the value of the most positively-connected perturbagen.

*S*_{i} = *s*_{i} / (max_{k ∈ all instances}(*s*_{k}))

For negatively connected perturbagenes, it is divided by *minus* value of the most negatively connected one:

*S*_{i} = *s*_{i} / (-min_{k ∈ all instances}(*s*_{k}))

Where:

*s*_{i} = up_{i} - down_{i}

In your example:

*S*_{5941} = 0.629,
*S*_{5968} = 0.593,
*S*_{5963} = 0.585,
*S*_{5936} = 0.580

**Enrichment score is based on permutations**

The connectivity scores *S*_{i} are used to sort the list of all instances (perturbagens); if two substances have the same *S*_{i}, the one with higher up_{i} will be positioned higher. This gives us the rank - in your example, H-7 instances got ranks: 174, 305, 339, 368 (the higher the connectivity score, the higher the position on the list - or the lower the rank).

This list would have a total length of 6100 (the number of instances in the old CMap).

Once the ordering is ready, we can pose the following question:

are the chosen instances accumulated near the top of the sorted list of all instances?

and use Kolomogov-Smirnov (KS) statistic to asses that. A slightly simplified version would be to look at the maximum of absolute differences between:

- a hypothetihcal, equal distribution along the list (let's call it
*j*), and
- the real distribution of the analyzed perturbagens (let's call it
*Vj*)

As there are four instances considered, the distribution *j* would simply be:

1/4, 2/4, 3/4 and 4/4 (or [0.25, 0.50, 0.75, 1.00])

while the real distriubtion *Vj* of ranks is [174/6100, 305/6100, 339/6100, 368/6100], or [0.0285, 0.0500, 0.0556, 0.0603].

When we detract the two culmulative distributions (NB it is a nice property of ranks - they give us culmulative distributions) |*j* - *Vj*|, we get:

[0.2215, 0.4500, 0.6944, 0.9397]

Where maximum of those is 0.9397 ~= 0.94. This is your enrichment score!

As I mentiond earlier, this is a simplification, as the proper KS calculation would detract one when considering "negative" values. For detailed formulas, see this chapter of the documentation.

Ps. This plot may help to understand the KS:

Do you happen to know what happens if we score a query signature with only one sign for all genes? So that the calculation for what you refer to as up_i (and the authors refer to as ks_i) is not possible to do in a signature of negatives for example. Simply substitute zero? To clarify in the language of the original paper, the up tag list would be empty in such a case, hence the a and b calculations for it not possible.