Dear all,
I have this matrix of non-continuous, independent data (let's call them "a", "b", "c").
Variable 1 Variable 2 Variable 3
Sample 1 a a a
Sample 2 b b e
Sample 3 c a a
Values are not continous numbers, or measures, but more of a label for each "Variable". Additionally, "a" in Variable 1 does not relate to "a" in Variable 2.
I would like a way to assess similarity (=shared values) between "Samples". I don't care to know which "Variable" is similar or different between the two "Samples", just the number of shared values is fine.
For the example above, we see that:
- Sample 1 has 2 shared value with Sample 3 (for Variable 2)
- Sample 2 has 0 shared value with Sample 1 for any variable
- Sample 3 has 0 shared values with Sample 2
- Each sample has 3 shared values with itself
In that example, Sample 1 and 3 are more similar to each other than Sample 2 (if we exclude self-similarity).
From this data, I would like to create a "similarity" matrix of shared values:
Sample 1 Sample 2 Sample 3
Sample 1 3 0 2
Sample 2 0 3 0
Sample 3 2 0 3
Using SUM and INDEX and MATCH, I managed to make this matrix above using:
=SUM(1*(INDEX(<data>,MATCH(<first header in column of the similarity matrix>,<first column, headers of data>,0),)=INDEX(<data>,MATCH(<first header in row of the similarity matrix>,<headers of data>,0),))) entered as an array.
But I realized that when one value is missing (empty cell), it is still considered and counted as similar as another empty cell. I am struggling to adapt the formula to ignore empty cells and would really be grateful for some advice.
Thank you very much