Question

Evaluate the precision of disease-disease associaitons

1

Entering edit mode

9.7 years ago

kim ▴ 70

Hi all,

Simply put, I have disease-disease association matrix. This matrix was constructed based on GO pathways. For example, I calculated semantic similarity between disease A and disease B from 100 functional pathways in disease A and 50 functional pathways in disease B. Thus, this matrix contains 100 * 50 association values from 0 to 1. (ex, total similarity score between Alzheimer and parkinson is 0.5).

Here, I am facing a challenge how to set up the cut-off and measure this score using statistical method including p-value. I googled and two papers look great to use the method. The first one is http://www.sciencedirect.com/science/article/pii/S1532046411002073 and the second one is http://www.pnas.org/content/105/52/20870.long.

In particular, it is mentioned that "~ scores were converted to p-values by comparing them with a corresponding null model - 20,000 random disease pairs in DO-Lite. Cross-validation on a benchmark set of diseases was used to determine the optimal combination of disease similarity p-value cut-off and hypergeometric cutoff for process enrichment".

Any suggestions how to convert specific scores to p-values and how to validate them?

I found DO-Lite website http://django.nubic.northwestern.edu/fundo/databrowse/.

But I don't have any idea how to extract the similarity score between diseases and apply them to my results. Also I am welcoming if you guys give me another way in regard to this issue.

Thanks!

R • 1.8k views

ADD COMMENT • link updated 2.4 years ago by Ram 43k • written 9.7 years ago by kim ▴ 70

0

Entering edit mode

Out of curiosity, what are you trying to achieve with this? GO annotations aren't exactly evenly distributed, so I expect that even comparing scores to those from the null model won't really be truly informative (e.g., any two diseases that affect the same tissue will likely be significant, but is that really telling you anything?).

ADD REPLY • link updated 2.4 years ago by Ram 43k • written 9.7 years ago by Devon Ryan 104k

0

Entering edit mode

Thanks Devon for your comments!

Actually, I am not sure I am following you well because of lack of my knowledge. Yes, you are right. Just based on the distribution of GO terms, null modeling based method may not make sense to check the significance. However, my data in this step does not handle directly GO terms. Rather, I am trying to take a look at general features of the semantic scores which were calculated by GO terms with converting the scores (from zero to one) to p-values. I am not sure how the authors in the paper that I linked above convert the semantic scores to p-value using disease pairs in DO-Lite. For me, the evaluation of semantic similarity measurement is a challenging task. Still, I don't know the best way to do it.

ADD REPLY • link updated 2.4 years ago by Ram 43k • written 9.7 years ago by kim ▴ 70