Question

What is nucleotide substitution rate (of aligned sequences)?

0

Entering edit mode

8.6 years ago

Akhil • 0

I was reading research papers about genome comparison in chimps and humans where I found this term many times. It is given in units of percentage.

evolution molecular-biology genomics genetics • 2.2k views

ADD COMMENT • link updated 19 months ago by Ram 43k • written 8.6 years ago by Akhil • 0

1

Entering edit mode

8.6 years ago

thackl ★ 3.0k

EDIT: My answer is bogus - must have banged my head somewhere yesterday. I was thinking in too simple terms here, although I definitely should have known better.

Since Akhil was talking about genome-alignments and substitutions in %, I kind of failed to recognize the rate part, which of course implies a time component and an evolutionary context. And as cyril-cros pointed out, the term is very well defined in that respect.

On a side note, I am working a lot with PacBio and other sequencing data recently and people often use the term error rates there. This discussion reminded me that this term seems to be poorly chosen...

Wrong stuff:

I'm assuming it refers to the relative amount of nucleotides that differ between the human and chimp sequence

 *     *
TTCGACGAATCG human
TACCATCGAACG chimp

Substitution rate would be 16.7 % (2/12).

ADD COMMENT • link updated 19 months ago by Ram 43k • written 8.6 years ago by thackl ★ 3.0k

2

Entering edit mode

Your example is flawed, and you have to apply the Jukes-Cantor correction.

ADD REPLY • link 8.6 years ago by kloetzl ★ 1.1k

0

Entering edit mode

Yep, not my best work, thanks...

ADD REPLY • link 8.6 years ago by thackl ★ 3.0k

Ram · Accepted Answer · 2015-09-13

http://onlinelibrary.wiley.com/doi/10.1038/npg.els.0005109/pdf for a definition

https://en.wikipedia.org/wiki/Substitution_model

https://en.wikipedia.org/wiki/Sequence_alignment

Basically, during evolution your DNA is going to accumulate random changes (genetic drift). Quantifying this rate of change can be used as a form of molecular clock or to do phylogeny work.

Many models exist, but they often use Markov chains: at each step of time, a nucleotide can change or stay the same. Changes are not completely random between the 4 bases: changes between purines and pyrimidines (transition vs transversion) are more or less likely for example, and some corrections have to be done. You don't take into account effects of evolutionary selection/population size with this type of model.

Here, you are not necessarily interested in the percentage of difference but maybe rather in the number of differences per site per unit of time. Quite certainly, this term will be defined in the Material and Methods section, where they should mention the tool they used.