Question: Consistency of group memberships across variables
gravatar for mforde84
2.7 years ago by
mforde841.2k wrote:


I did some similarity network fusion with mRNA and miRNA, and I'm generating a variety of potential clustering options which consist of between 2-5 possible members. I'm interested in testing for membership similarity between multiple categorical variables, in particular those memberships which predict the same number of optimal clusters.

For instance, let's say that two categorical variables with 3 levels have the following membership:



I want to test how consistently samples group together across these variables. The name of (1,2,3) is irrelevant and strictly qualitative. In this instance, it would be a perfect match because the 2 matches bidirectionally to 1.

Is there a statistical test that I can apply to test this? I had read that chi square might be appropriate, but I'm still a little fussy on how to interpret it in my application, since I don't think it accounts for the semantic equivalences between 1 and 2 in the different groups.

Any suggestions?

membership • 841 views
ADD COMMENTlink modified 2.6 years ago by Jean-Karim Heriche21k • written 2.7 years ago by mforde841.2k

Well? Anyone have any suggestions? I mean come now, this isn't stack exchange guys.

ADD REPLYlink written 2.7 years ago by mforde841.2k

The simplistic thing to do is to use a stacked bar plot of your data, and see the grouped distribution. You should code your samples to avoid semantic issues. I don't think any statistic will 'help' you in this matter. At this point your data seems purely based on frequency in a small amount of groups as well as among a small amount of samples...

ADD REPLYlink written 2.6 years ago by theobroma221.1k

Sounds reasonable. If I could recode them properly, I could even do a contingency table.

ADD REPLYlink written 2.6 years ago by mforde841.2k
gravatar for Jean-Karim Heriche
2.6 years ago by
EMBL Heidelberg, Germany
Jean-Karim Heriche21k wrote:

Your problem amounts to measuring similarity between sets. There are plenty of similarity measures for sets (e.g. Jaccard index), and you can get a p-value for the overlap between two sets using the hypergeometric distribution.
As for the semantic relationship, only you can tell how to account for it since we have no information on this. The standard way of dealing with semantic relations is through ontologies.

ADD COMMENTlink written 2.6 years ago by Jean-Karim Heriche21k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1063 users visited in the last hour