Question

Off topic:[Excel] Generate a similarity matrix on non-continuous values, ignoring empty cells

0

Entering edit mode

8.9 years ago

Kame ▴ 20

Dear all,

I have this matrix of non-continuous, independent data (let's call them "a", "b", "c").

Variable 1	Variable 2	Variable 3
Sample 1	a	a	a
Sample 2	b	b	e
Sample 3	c	a	a

Values are not continous numbers, or measures, but more of a label for each "Variable". Additionally, "a" in Variable 1 does not relate to "a" in Variable 2.

I would like a way to assess similarity (=shared values) between "Samples". I don't care to know which "Variable" is similar or different between the two "Samples", just the number of shared values is fine.

For the example above, we see that:
- Sample 1 has 2 shared value with Sample 3 (for Variable 2)
- Sample 2 has 0 shared value with Sample 1 for any variable
- Sample 3 has 0 shared values with Sample 2
- Each sample has 3 shared values with itself

In that example, Sample 1 and 3 are more similar to each other than Sample 2 (if we exclude self-similarity).

From this data, I would like to create a "similarity" matrix of shared values:

	Sample 1	Sample 2	Sample 3
Sample 1	3	0	2
Sample 2	0	3	0
Sample 3	2	0	3

Using SUM and INDEX and MATCH, I managed to make this matrix above using:

=SUM(1*(INDEX(<data>,MATCH(<first header in column of the similarity matrix>,<first column, headers of data>,0),)=INDEX(<data>,MATCH(<first header in row of the similarity matrix>,<headers of data>,0),))) entered as an array.

But I realized that when one value is missing (empty cell), it is still considered and counted as similar as another empty cell. I am struggling to adapt the formula to ignore empty cells and would really be grateful for some advice.

Thank you very much

Excel • 4.3k views

ADD COMMENT • link 8.9 years ago by Kame ▴ 20