Question: Detecting Different Amino Acid Composition Between Two Bacterial Genomes?
1
gravatar for Martin A Hansen
6.5 years ago by
Martin A Hansen3.0k
Denmark
Martin A Hansen3.0k wrote:

I was asked this question, which I hereby pass on: Is there a standard method to test if the amino acid composition is significantly different between two bacterial genomes?

I guess the question is a bit tricky because significant could be statistically significant - or biologically significant.

genome • 2.5k views
ADD COMMENTlink written 6.5 years ago by Martin A Hansen3.0k

This is an ambiguous question in more ways. Even if you chose one of the terms what should statistically or biologically significant mean? When you get questions like this the best is to turn it right back to the originator. I found that most of the time they don't know what they mean. So now your left with trying to come up an answer to an unspecified question.

ADD REPLYlink modified 6.5 years ago • written 6.5 years ago by Istvan Albert ♦♦ 80k

@Istvan, I totally agree that the question is not precise enough, but then the role of the bioinformagician is to improve this type of question. I am interested if anyone have experience with the subject.

ADD REPLYlink written 6.5 years ago by Martin A Hansen3.0k

@Martin: first of, thanks for posting the question to this interesting and very useful media. Since I suspect I am the originator of the question I will try to elaborate. Amino acid frequencies do not have a random and uniform distribution in proteomes and therefore my question is how to find out whether there is a statistical significant difference between the frequency of an amino acid in one organism compared to the other organism, knowing that that same amino acid might have a significantly different frequency compared to the rest of the amino acids in the same organism due to biological reasons. I tried using the contigency table, but since amino acids are not randomly distributed I figured that it did not give me the right answer. Then I tried with calculating the difference in percentages of single amino acids in one organism compared to the other and the mean and std.dev. of that, and used mean+std.dev. to detect for significantly large deviations, whatever that means.. This gave the biological results I wanted but I don't know if this can be used to detect statistical significance. Would it be possible to use the frequencies of amino acids in one organism as the expected frequency and the frequency of the amino acids in the other organism as the observed and then use Chi square? Is that maybe what has been done in the answer below? I was not aware that this question was such a complicated matter and I will be thankful for any input.

ADD REPLYlink written 6.5 years ago by Aviaja L. Hauptmann0
8
gravatar for a.zielezinski
6.5 years ago by
a.zielezinski8.6k
a.zielezinski8.6k wrote:

I would use Chi square test.

Assume you have amino acid compositions for two bacterial genomes in csv file: comp.csv.

G, A, C, F, I, L, M, V, W, Y, R, K, H, N, P, Q, S, T, D, E
composition1, 36.2, 6, 0.3, 2.4, 0.7, 1.8, 0.7, 2.7, 0.5, 7.8, 9.2, 1.8, 0.6, 3.6, 2.6, 1.9, 9.7, 1.8, 6.2, 3.5
composition2, 21.3, 6.5, 0.4, 1.4, 0.7, 1.1, 0.3, 2.3, 6.3, 0.4, 4.4, 7, 0.6, 8.1, 3.2, 3.8, 15.2, 4, 8.1, 4.8

Use R for calculations:

d <- read.csv("comp.csv",header=T,sep=",")
chisq.test(d)

You will get:

Pearson's Chi-squared test

data:  d 
X-squared = 25.8429, df = 19, p-value = 0.1346
ADD COMMENTlink modified 6.5 years ago • written 6.5 years ago by a.zielezinski8.6k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2123 users visited in the last hour