Off topic:[Excel] Generate a similarity matrix on non-continuous values, ignoring empty cells
0
0
Entering edit mode
8.9 years ago
Kame ▴ 20

Dear all,

I have this matrix of non-continuous, independent data (let's call them "a", "b", "c").

Variable 1	Variable 2	Variable 3
Sample 1	a	a	a
Sample 2	b	b	e
Sample 3	c	a	a

Values are not continous numbers, or measures, but more of a label for each "Variable". Additionally, "a" in Variable 1 does not relate to "a" in Variable 2.

I would like a way to assess similarity (=shared values) between "Samples". I don't care to know which "Variable" is similar or different between the two "Samples", just the number of shared values is fine.

For the example above, we see that:
- Sample 1 has 2 shared value with Sample 3 (for Variable 2)
- Sample 2 has 0 shared value with Sample 1 for any variable
- Sample 3 has 0 shared values with Sample 2
- Each sample has 3 shared values with itself

In that example, Sample 1 and 3 are more similar to each other than Sample 2 (if we exclude self-similarity).

From this data, I would like to create a "similarity" matrix of shared values:

	Sample 1	Sample 2	Sample 3
Sample 1	3	0	2
Sample 2	0	3	0
Sample 3	2	0	3

Using SUM and INDEX and MATCH, I managed to make this matrix above using:

=SUM(1*(INDEX(<data>,MATCH(<first header in column of the similarity matrix>,<first column, headers of data>,0),)=INDEX(<data>,MATCH(<first header in row of the similarity matrix>,<headers of data>,0),))) entered as an array.

But I realized that when one value is missing (empty cell), it is still considered and counted as similar as another empty cell. I am struggling to adapt the formula to ignore empty cells and would really be grateful for some advice.

Thank you very much

Excel • 4.3k views
ADD COMMENT
This thread is not open. No new answers may be added
Traffic: 2007 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6