Question

How to select pairs to have a significant sample for similarity

1

Entering edit mode

5.7 years ago

Damianos P. Melidis ▴ 60

Dear all,

I have a list of biological entities (say genes) and I would like to compute all unique pairs (e.g (A,B,C)-> (A,B), (A,C), (BC)) and then calculate their similarity based on their GO. Good up to now, but the number of entities is say 2000 so the number of unique pairs is millions (2000 choose 2) so the similarity for all will take months to compute. I have started to compute the similarity for all pairs and it took 3 weeks for 68000 pairs. As similarity I use the GAPGOM method.

Thus can you suggest me a sound technique on how to sample pairs in order to have significant result?

Thank you in advance!

similarity gene ontology sampling pairs • 1.5k views

ADD COMMENT • link updated 5.6 years ago by Biostar 20 • written 5.7 years ago by Damianos P. Melidis ▴ 60

2

Entering edit mode

I would like to compute all unique pairs (e.g (A,B,C)-> (A,B), (A,C), (BC)) and then calculate their similarity based on their GO.a

Why are you doing this with so many pairs? I don't think the GAPGOM package was designed with this in mind.

ADD REPLY • link 5.7 years ago by Mark ★ 1.6k

0

Entering edit mode

Because I am working on a similarity function and I want to test how much this function correlates with the GO similarity. So I can't really make it with a little number of pairs of genes because then it would not be significant..

ADD REPLY • link 5.7 years ago by Damianos P. Melidis ▴ 60

0

Entering edit mode

On the point of comparing some genes similarity with their GO similarity, you may be interested in this paper on which I collaborated.

ADD REPLY • link 5.6 years ago by Jean-Karim Heriche 27k

0

Entering edit mode

How do you go about the computation? Typical semantic similarity measures like Resnik's across GO biological process domain should take a few hours to compute for all ~20000 protein coding human genes without parallelization.

ADD REPLY • link 5.6 years ago by Jean-Karim Heriche 27k