Question: How to prove one set of protein is more conserved than the other set?

0

465186528 •

**0**wrote:Hi guys~ I have two sets of proteins: let's say A set (contains 100 A genes) and B Set containing (100 B genes). I want to show that genes in A Set are more conserved and less divergent than ones in B Set. I build phylogenetic trees, branch length for A Set is much longer than one from B Set. But it does not seem to be a good way to compare. I also tried network analysis using protein sequence identity, most genes from A set form a big network however genes from B set forms multiple network at the same cut value.

Could someone know a better way to compare using a more quantitive way?

Generate all-versus-all pairwise global alignments in set A and calculate mean percent identity with standard deviation. Do the same for the sequences within set B. Then, depending on the distribution of identities apply t-test or Mann-Whitney test to see whether the difference between the two sets is statistically significant.

8.6kThanks a lot. this is a feasiable way.

0Are A all orthologues of one another and likewise for B?

If you got long branch lengths that either means A is the less conserved, or your alignment isn't very good.

You could try dN/dS analyses.

12kThanks for your reply. A set and B set are from two different protein pfam family. Within the dataset, proteins are similar to each other. For branch length, I agree with you, longer branch does not directly implict the conservation. dN/dS or Ka/Ks is used to show the balance of selection, I guess it can not help compare divergence degree of two different sets of protein.

0dN/dS would tell you if one group is subject to more drift than the other, which implies less conservation, but it isn't a direct measure I agree.

What I mean by the branch length is (assuming your alignments are OK), you already have your answer - that A is more divergent than B, but it sounds like you are looking for data to confirm a hypothesis you've already decided the answer to...

I don't know why you think that isn't a good comparator?

12kThanks for your reply. I am sorry that I did not make it clear. I say that branch length of A Set is longer than B set. I mean the scale bar for each tree. Branching length is comparable if I could find a way to compare, do you have any idea about that?

Also for your question, 'you are looking for data to confirm a hypothesis you've already decided the answer to' Yes, I am trying to find something that I already have the answer. Because most proteins from A set share over 50% identity with each other, which can not be found in B set. Thus, I think A set is more conserved. Then I search for a approach to prove it and ask this question on Biostar.~~~

0But if you know, through some means, that A are over 50% identical, and B are not, and the scale bar on your tree is larger (which mean your branch lengths also should be), then why not use the technique youâ€™ve already apparently used which has already given you the answer?

To say it another way,

how do you already know A is more conserved than Bbefore you test it?12kAs you can see, over 50% identity and longer branch are preliminary things that I know. But I am looking for a quantitive way to nicely show the difference. For example, if I just see scale bar is different, it is not strong proof. Reviewers and even I would have questions, for example, if this difference pass the statistic test. I get one possible way to do it, as showed by @a.zielezinski

0