Question

How to predict cancer-related proteins in Protein-Protein Interaction networks

0

Entering edit mode

9.5 years ago

txd866 • 0

Protein-protein interaction networks are known. It is an undirected graph. Each row of the networks is like this (Protein 2 - Protein 6), and It represents the interaction between Protein 2 and Protein 6.

networks:
Protein 2 - Protein 6
Protein 4 - Protein 5
Protein 6 - Protein 5
Protein 5 - Protein 7
...

In this network, the function of some proteins are known, and proteins with similar function tend to be relevant.

The function of some proteins:
Protein 2,Func_002
Protein 2,Func_007
Protein 2,Func_008
Protein 3,Func_007
Protein 3,Func_008
Protein 3,Func_009
Protein 4,Func_011
Protein 5,Func_015
...

And It is known that a part of proteins are cancer-related proteins,

The known proteins:
Protein 4,Cancer
Protein 6, Cancer
Protein 7, Cancer
Protein 10, Cancer
...

But the vast majority of proteins is unknown whether is cancer-related protein or noncancer-related protein. How can you use the known cancer-related proteins and nonCancer-related proteins to predict the protein whether is or not a cancer-related protein?

I do not know how to solve this problem.

networks protein • 2.9k views

ADD COMMENT • link updated 2.9 years ago by Ram 45k • written 9.5 years ago by txd866 • 0

score 4 · Accepted Answer · 2016-01-08

4

Entering edit mode

9.5 years ago

Jean-Karim Heriche 27k

Spread labels on the network. This paper describes the algorithm for label propagation. You may also find this review and papers it references useful.

ADD COMMENT • link 9.5 years ago by Jean-Karim Heriche 27k

0

Entering edit mode

Dear Jean-Karim Heriche, how to understand the result of label propagation algorithm? Thank you. This is my question. http://stackoverflow.com/questions/34701650/how-to-understand-the-result-of-label-propagation-algorithm

ADD REPLY • link 9.5 years ago by txd866 • 0

0

Entering edit mode

This is related to diffusion. Imagine that your labelled nodes are labelled with some blue color and you let this color diffuse through the edges and the higher the edge weight the more color goes through it. What you compute can be thought of as being the amount of blue that reaches the unlabelled nodes at equilibrium. For more than one class, imagine that you have red and blue nodes and that both colors diffuse through the edges in proportion of the edge weight then the resulting matrix can be viewed as giving the amount of red and blue at each unlabelled node.

ADD REPLY • link 9.5 years ago by Jean-Karim Heriche 27k

Ram · Accepted Answer · 2016-01-08

There are two ways to utilize PPI information.

you can map the known cancer genes to PPI, extract a sub-network or cluster the PPI network --> this will give you a cluster/sub-network --> you can prioritize the sub-networks that are significantly enriched with your query (known cancer) genes. Such prioritized top cluster will have many cancer genes along with new genes. You can then check whether the new genes has any relevance for your disease of interest. The proteins with similar functions are expected to cluster together. This approach will only give you a way to generate new testable hypothesis.

There are many ways to extract sub-network or cluster. Please see cytoscape. KeyPathwayminer, clusterviz are some useful apps in cytoscape.

You can check our past papers for examples : )
- you can identify disease-relevant genes from gene expression and then use PPI: http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0112193
- You can collect literature disease gene data and use it with PPI: https://www.researchgate.net/publication/236062631_Systems_Biology_Approaches_for_Discovering_Biomarkers_for_Traumatic_Brain_Injury
In case you have not already seen it, Idekar's group has a nice paper where they developed network-based classifier