I was wondering if you could advise me in my following questions:
From a bait-prey pull-down experiment, I have an initial interaction network of 400 proteins from String database. Then, initially, to retrieve my bait protein and its highly associated proteins from this network, I selected the bait protein and then selected first and second round of nodes/proteins which were highly associated to the bait node. Then, from the initial network, I extracted this selection which contained 100 nodes and made a separate network and considered this retrieved network as the sub-network1 (all done by Cytoscape).
Now, I need to more explore this sub-network (100 proteins) and map biological process relationships within this sub-network. Furthermore, I need to do clustering based on the GO annotations of proteins within this sub-network to also show connections and interconnections among these proteins.
Also, in terms of visualization of the clustered network, all nodes of each cluster should not be merged in a bigger node. Instead, each node of each cluster will stay the same size and location, but only coloured. I could not find any tool to do it. So, I am thinking of using ClueGO plugin of Cytoscape tool to do clustering based on enrichment analysis of proteins (based on GO biological processes). Then, based on each gene corresponding to which enriched function/cluster, I manually colour nodes.
I would highly appreciate if you could advise me on the following questions:
Is the abovementioned approach that I am exploring highly associated proteins to bait protein by using sub-networking is sensible and scientifically accepted for publication? (any other approaches/suggestions are highly appreciated).
Or instead of above sub-networking, it is better to first do clustering (e.g. MCL or ClusterOne, ..) for the initial 400-protein network, and then select only that cluster which bait protein belongs to, and map biological process of that cluster only?
As one node/protein may be overlapped in several clusters, how can I find the top/main function for each protein, especially in ClueGO result? as I have to assign only one cluster for each gene (and only one colour for each node).
From the ClueGO result files, the file entitled '' ClueGOResults_GO-BP Genes With Corresponding Functions.txt '' is empty for me. How can I have the information of this file? I really need to know the p-values in which each individual gene corresponds to each cluster/function in order to locate each protein only in ONE main cluster.
I would highly appreciate if you could guide and help me in this regard.