I have a list of biological entities (say genes) and I would like to compute all unique pairs (e.g (A,B,C)-> (A,B), (A,C), (BC)) and then calculate their similarity based on their GO. Good up to now, but the number of entities is say 2000 so the number of unique pairs is millions (2000 choose 2) so the similarity for all will take months to compute. I have started to compute the similarity for all pairs and it took 3 weeks for 68000 pairs. As similarity I use the GAPGOM method.
Thus can you suggest me a sound technique on how to sample pairs in order to have significant result?
Thank you in advance!