Question

What Is A Good Measure Of Sequence Divergence?

5

Entering edit mode

13.3 years ago

John ▴ 790

I have 10 different species of bacteria. I have sequenced 4 genetic markers from each of the 10 species. I have made phylogenetic trees for each marker and for all markers combined (this may be irrelevant to be question). I would like to know if there has been more selection for one particular marker than the other markers. e.g. how much evolution has taken place in marker 1 vs. marker 2. Is there a greater amount of sequence diversity within the 10 sequences for marker 1 vs. the 10 sequences for marker. I'm not sure which metric to use, or if I can look at the distance of the branches of the trees and use that as my metric.

distance phylogenetics tree evolution sequence • 4.0k views

ADD COMMENT • link updated 13.3 years ago by Amr ▴ 160 • written 13.3 years ago by John ▴ 790

score 6 · Answer 1 · 2011-01-09

John,

"divergence" and selection are really two different things. After all, one set of homologous genes could be more divergent than another just because of mutation rate.

The most common way to test for selection on protein coding genes is to compare the rate of synonymous to non-synonymous mutations in each marker (look up Ka/Ks tests). In theory, a ratio > 1 is evidence for positive selection and a one < 1 is evidence for purifying selection (selection to make the gene change, and selection to keep it the same respectively).

In practice, so much selection is purifying it's uncommon to get ratios greater than 1 (since the test applies to the whole sequence). Wikipedia actually has an article on these tests that might be helpful

score 0 · Answer 2 · 2012-01-20

Ah, I did my MSc thesis on something like this, except I had 6 markers, and hundreds of isolates of the same species!

There is a technique called MLST which has recently become the industry standard for bacterial species delineation and even just characterisation.

I highly recommend you check out this this database. Explore the whole site, im sure there is plenty of tools tht would be relevant to your work. in particular try the BURST algorithim which will do a good job of graphically representing the distance between your isolates. It will consider your different markers as "alleles". Its designed for much larger datasets but I would definately give it a try!