Hi all,
I am nor sure if anyone has ever worked with the compleat web tool. It is a tool for protein complex enrichments.
I am trying to understand how the score for the protein complexes in this tool is calculated. For that I have taken one example file HumanRNAiDNARepairScreen.txt
and uploaded it to COMPLEAT without changing the parameters.
I have than taken a closer look at the complex SLIK (SAGA-like) complex. According to the tool it has a score of 0.02553 (=2.553e-02)
I have tried to calculate this same score by using the scores for the protein complex members, I have taken from the output of this single protein (s. table below).
The formulate for calculating the complex score is given in your paper as
CIQM = (1 / ((Q3 - Q1) +1) * SUM(Xi from i=Q1 to i=Q3).
This is the vector I am using for the calculations, already sorted in decreasing order.
SLIK <- c(1.68, 1.6, 1.56, 1.47, 1.13, 1.12, 1.03, 0.94, 0.91, 0.59, 0.51, 0.5, 0.49, 0.46, 0.42, 0.31, 0.001, 0, 0, -0.005, -0.1, -0.2, -0.27, -0.45, -0.8, -0.88, -1.09)
As you can see, there are 27 members (=n).
If I calculate Q1 and Q3 according to your paper I get 7 and 20 fro Q1 and Q3 respectively.
Q1 = (27/4) + 1 = 7.75 -> integer is 7
Q3 = (3*27)/4 = 20.25 -> integer is 20
When I than calculate the IQM with the above formula I get a different value than the one you show on the screen. the sum of all the vector values between position 7 and position 20 is:
sum(SLIK(7:20)) = sum(0.94, 0.91, 0.59, 0.51, 0.5, 0.49, 0.46, 0.42, 0.31, 0.001, 0, 0, -0.005, -0.1) = 5.026
so the IQM can be calculated like that:
IQM = ( 1 / ((20-7) +1 )* 5.026 = 0.359
So I have a discrepancy between my calculated IQM and yours.
I was wondering what I am doing wrong in this calculation. Am I taking the wrong quartiles?
Do I need to take the real quartiles The quartile of SLIK can be calculated as such:
quantile(SLIK)
0% 25% 50% 75% 100%
-1.0900 -0.0525 0.4600 0.9850 1.6800
Should Q1
and Q3
be 0.0525
and0.9850
? But how do I add them to the formula in the sum?
Do I need to remove all proteins from the vector, which have a 0 value?
I appreciate all the help in advance and hope you can help me solve this problem.
cu,
Assa
SILK complex members:
Symbol ID Name Score
TAF10 6881 TAF10 -0.8
TAF5 6877 TAF5 -0.005
KAT2A 2648 KAT2A 0.46
KAT2B 8850 KAT2B -1.09
BPTF 2186 BPTF 0.49
USP51 158880 USP51 0
USP22 23326 USP22 1.12
TAF6L 10629 TAF6L -0.88
CHD1 1105 CHD1 1.68
TAF9B 51616 TAF9B -0.2
TADA3 10474 TADA3 -0.27
SUPT3H 8464 SUPT3H 1.03
ATXN7 6314 ATXN7 0.42
TRRAP 8295 TRRAP 0.94
USP3 9960 USP3 0.31
CHD3 1107 CHD3 0.91
CECR2 27443 CECR2 1.56
LAPTM5 7805 LAPTM5 0.001
TADA2B 93624 TADA2B 1.6
CHD2 1106 CHD2 1.13
CHD4 1108 CHD4 0
TAF5L 27097 TAF5L 1.47
TADA1 117143 TADA1 0.51
TAF12 6883 TAF12 -0.1
TAF9 6880 TAF9 0.5
TAF6 6878 TAF6 -0.45
TADA2A 6871 TADA2A 0.59