Skewed Betweenness Centrality values resulting in almost unusable PPI networks for my university project - what can I do?
1
0
Entering edit mode
24 days ago

For context, I have inputted lists of proteins into STRING, which I then imported to Cytoscape, changed the size of the nodes to correlate with their degree and used continuous mapping to depict the BC values using a colour gradient. I have run into a couple of issues:

1) The highest BC value in one of my datasets is 1.0, with the second-highest around 0.7 and then the other dozens/100's of proteins incrementally decreasing as you move down the list. If I use continuous mapping and depict the BC values by a colour gradient, then the highest BC value will skew it and all my proteins will be one block colour, rather than a gradient. I am confused about how to interpret that.

2) Additionally, for some of my datasets I have increased the confidence threshold on STRING to 0.9, so when I have imported it into Cytoscape then there is a big main network, a couple of smaller networks and singular nodes. Some of my highest BC values are in the small networks (that are around 5/6 proteins big). Can I discount the smaller networks that are not connected to the main network or is that not scientific?

Bioinformatics Cytoscape • 537 views
ADD COMMENT
0
Entering edit mode
23 days ago

1- This is a visualization issue. A common solution is to truncate the scale: map colours to e.g. (0:0.6) such that all values above 0.6 will be the same colour. You can also use a bottom threshold if distinguishing lower values is not of interest.

2- Since the data extraction from STRING is already arbitrary (why 0.9? With any threshold you're choosing to ignore some interactions), you can also arbitrarily ignore some parts of it. However, make sure that you properly describe the process when reporting the results.

I wouldn't extract all types of interactions from STRING but only those that make sense in the context. For example, if interested in gene functions in eukaryotes, the most useful interactions are the physical ones and co-expression produces a lot of undesirable noise.

ADD COMMENT
0
Entering edit mode

Thank you for the response, may I just ask for clarification.

1) Are you saying that a solution could be to change my mapping from 0-0.7, and the top one or two proteins which are much higher than that, then denote as one single colour? If I am then looking to identify the top 10% of proteins, do I then look at the top 10% of proteins between the range of 0-0.7 or 0-1.0?

2) I am not really quite confident how exactly I would be able to explain arbitrary ignoring some parts of my data, so the idea of that makes me uneasy.

ADD REPLY
0
Entering edit mode

Are you saying that a solution could be to change my mapping from 0-0.7, and the top one or two proteins which are much higher than that, then denote as one single colour?

Yes, the proteins at and above the threshold would be the same colour.

If I am then looking to identify the top 10% of proteins, do I then look at the top 10% of proteins between the range of 0-0.7 or 0-1.0?

The top 10% is independent of the visualization so unless you have a reason to discard the top one or two proteins, they are part of the top 10%.

I am not really quite confident how exactly I would be able to explain arbitrary ignoring some parts of my data

Just explain what you do/did. You already chose to discard some information by setting a threshold in STRING (who says that interactions relevant to your study must have a score of 0.9?). Connected components (i.e. the parts of a graph not connected to each other) are typically processed separately but there is not much useful topological information you can extract from a 5 nodes network so just say so and move on.

ADD REPLY

Login before adding your answer.

Traffic: 2237 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6