Seeing If Differences In Amino Acids Hydrophobicity, Between Mutant Proteins, Have Statistical Significance.
2
4
Entering edit mode
13.2 years ago
Swatchpuppy ▴ 50

Hello,

Let's imagine that we have the following 5 mutants sequence of Protein kinace C:

S1 KVLGKGSFGKVMLADDKGTEELYA 24 S2 MVLFKGSFGKVMLGDRKGTEELYA 24 S3 MVLGKGSFGKVMLADRKG-EELYA 23 S4 MVLGKGSAGKVMLADRKGTEFLYA 24 S5 MVLGKGS-GKVMLFDRKGTEELYA 23 .. ** *** ***** * ** * *** ..

And after Kyte & Doolittle computings for hydrophobicity per amino acid, the following table was obtained:

AA    S1         S2      S3        S4       S5
1    0.633     1.633    1.278    1.278    0.922
2    0.278     1.278    0.922    0.922    0.567
3    0.578     1.578    1.222    1.222    0.867
4    0.578     1.578    1.222    1.111    1.222
5    0.111     1.111    0.756    0.644    0.756
6    0.111     0.467    0.111        0    0.111
7    0.111     0.467    0.111        0    0.111
8     -0.1     0.256     -0.1   -0.211       NA
9    0.367     0.367    0.367    0.256    0.367
10       1     0.756    1        0.889    1.111
11   0.656     0.411    0.656    0.544    0.767
12   0.356         0    0.244    0.133    0.356
13  -0.389    -0.744     -0.5     -0.5   -0.389
14  -0.389    -0.744     -0.5     -0.5   -0.389
15  -0.033    -0.389   -0.144   -0.144   -0.033
16  -0.889    -1.244       -1       -1   -0.889
17  -1.489    -1.844     -1.6     -0.9   -1.489
18  -1.489    -1.844     -1.6     -0.9   -1.489
19  -1.833    -1.944       NA   -1.244   -1.944
20  -1.244    -1.356   -1.356   -0.656   -1.356
21  -0.356    -0.356   -0.356    0.344   -0.356
22  -0.356    -0.356   -0.356    0.344   -0.356
23   0.189     0.189    0.189    0.889    0.189
24   0.689     0.689    0.689    1.389    0.689

There is also an a priori knowledge regarding mutant binding to a trial molecule, and regarding mutant function:

     Binding   Functional
S1      1           1
S2      1           1
S3      2           0
S4      2           0
S5      0           1

What statistical tests would you recommend to:

1.See if the difference between means for each mutant protein hydrofobicity is statistically significant?

mu1 = mu2 = mu3 = ... = mu n

2.The same as above but comparing each pair individually?

S1  S2  S3  S4 ... Sn  
S2   p   -   -   -  -  
S3   p   p   -   -  -  
...  .   .   .   -  -
Sn   p   p   p   p  -

3.See if the the amino acid property can somewhere be related to binding and functional properties?


  1. I think that the one-way anova won't do much good because we can see them as paired samples, paired by aminoacid.
    Do you think that repeated mesures anova here can be used here?

  2. I was thinking on the pairwise.t.test for paired samples in R, with Bonferroni as Method for adjusting p values.
    What do you think of this method?


NOTE: This is fabricated data.

I have read the http://biostar.stackexchange.com/questions/4208/statistical-analysis-of-protein-sequence-properties post, and i reckon that there are a few similarities in both problems, but even so the objective are quite different.


Thanks in advance.

statistics protein amino-acids • 3.7k views
ADD COMMENT
0
Entering edit mode

Could you please be a little bit more specific? Which means? Besides that many amino acid properties are not independent, specially on a residue basis. Can you state your test question?

ADD REPLY
0
Entering edit mode

YEs,

I think so. But i think you missunderstood the property, it is just one but an observation per amio-acid/protein.

H: Are the means of hidrophobicity different between proteins.
(Probably an one-way anova)

H: Which proteins have means of hidrophobicity different from each other. (I'm thinking about the post-hoc tests here)

H: How the means are correlated with binding.
(this probably goes for a classification problem, or a simple correlation test)

H: How the means are correlated with function.
(idem)

ADD REPLY
0
Entering edit mode

YEs, I think so. But i think you missunderstood property, i meant just one observation (hidrophobicity) per amio-acid/protein. The hipotesis would be: H: Are the means of hidrophobicity different between proteins. (Probably an one-way anova) H: Which proteins have means of hidrophobicity different from each other. (I'm thinking about the post-hoc tests here) H: How the means are correlated with binding. (this probably goes for a classification problem, or a simple correlation test) H: How the means are correlated with function. (idem)

ADD REPLY
0
Entering edit mode

I've forgot to ask: which scale you are using?

ADD REPLY
0
Entering edit mode

As I promised. A paired t-test with Bonferroni's correction is very similar to ANOVA. On either case, the most precise way to interpret your case is "same subject, different treatments". The tests I suggest assumed independent samples. As I said before, on an amino acid basis, hydrophobicity isn't a independent measure. Normally, methods to estimate it use a window of size 3-5 aa. Yet, you can use ANOVA for testing all pairs.

ADD REPLY
0
Entering edit mode

You are right about the need to use methods that use a window of size 3-5 aa, i don't think that for this particular analysis (hidrophobicity), but let's imagine that we are studying another property that depends on surrounding aa, what methods do you think that could be appropriate for this?

ADD REPLY
0
Entering edit mode

Hydrophobicity depends on the neighbors. Check the scales at ExPASy/ProtScale. If you really really want to perform a powerful analysis to cross analyze sequence-function relationships (aa properties included) you must check Raganathan Lab (http://www.hhmi.swmed.edu/Labs/rr/). He developed the most powerful methods to date. Quite laborious, but worth a try!!!

ADD REPLY
0
Entering edit mode

Thanks for the hint. I will tell you about the results.

ADD REPLY
3
Entering edit mode
13.2 years ago

You're probably right in you comment. I've checked the hydrophobicity distribution for soluble proteins. It can be safely approximated by a gaussian (inside a the protein too!). So, you can use ANOVA indeed to inquire the differences on the mean hydrophobicity. After that, a Bartlett's test would suffice to separate the groups. You can also use Levene's ou Brown-Forsythe. For the third/fourth point you could use logist regression or linear discriminant analysis.

Here, the hydrophobicity distribution for Protein kinase C, brain isozyme using Kyte & Doolittle:

-3.378 -3.11075 3
-3.11075 -2.8435 3
-2.8435 -2.57625 8
-2.57625 -2.309 11
-2.309 -2.04175 12
-2.04175 -1.7745 27
-1.7745 -1.50725 52
-1.50725 -1.24 35
-1.24 -0.97275 55
-0.97275 -0.7055 63
-0.7055 -0.43825 65
-0.43825 -0.171 83
-0.171 0.09625 66
0.09625 0.3635 65
0.3635 0.63075 53
0.63075 0.898 32
0.898 1.16525 19
1.16525 1.4325 8
1.4325 1.69975 4
1.69975 1.967 6

What do you think?

ADD COMMENT
0
Entering edit mode

To what correspond the last column?

ADD REPLY
0
Entering edit mode

What is the last column?

ADD REPLY
0
Entering edit mode

I think that maybe a paired sample analisys would be more powerfull. Read the next answer.

ADD REPLY
1
Entering edit mode
13.2 years ago

I think that this data is ideal for a supervised multi variant analysis approach. Check out the MADE4 R package that might be able to be applied to this problem (if you can get the data into the right format). The section that applies best is the between groups analysis [BGA].

ADD COMMENT
0
Entering edit mode

I haven't got the time to look at this package, as soon as possible i will give you feedback on this. Thanks for your help.

ADD REPLY

Login before adding your answer.

Traffic: 2614 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6