What is nucleotide substitution rate (of aligned sequences)?
2
0
Entering edit mode
5.9 years ago
Akhil • 0

I was reading research papers about genome comparison in chimps and humans where I found this term many times. It is given in units of percentage.

evolution genomics molecular biology genetics • 1.5k views
3
Entering edit mode
5.9 years ago
cyril-cros ▴ 910

http://onlinelibrary.wiley.com/doi/10.1038/npg.els.0005109/pdf for a definition

https://en.wikipedia.org/wiki/Substitution_model

https://en.wikipedia.org/wiki/Sequence_alignment

Basically, during evolution your DNA is going to accumulate random changes (genetic drift). Quantifying this rate of change can be used as a form of molecular clock or to do phylogeny work.

Many models exist, but they often use Markov chains: at each step of time, a nucleotide can change or stay the same. Changes are not completely random between the 4 bases: changes between purines and pyrimidines (transition vs transversion) are more or less likely for example, and some corrections have to be done. You don't take into account effects of evolutionnary selection/population size with this type of model.

Here, you are not necessarily interested in the percentage of difference but maybe rather in the number of differences per site per unit of time. Quite certainly, this term will be defined in the Material and Methods section, where they should mention the tool they used.

0
Entering edit mode

thank you very much. I am a beginner in genomics, and having a hard time understanding many terms. I got to have some idea about it from what you wrote.

1
Entering edit mode
5.9 years ago
thackl ★ 2.8k

EDIT: my answer is bogus - must have banged my head somewhere yesterday. I was thinking in too simple terms here, although I definitely should have known better.

Since Akhil was talking about genome-alignments and substitutions in %, I kind of failed to recognize the rate part, which of course implies a time component and an evolutionary context. And as cyril-cros pointed out, the term is very well defined in that respect.

On a side note, I am working a lot with PacBio and other sequencing data recently and people often use the term error rates there. This discussion reminded me that this term seems to be poorly chosen...

Wrong stuff:

I'm assuming it refers to the relative amount of nucleotides that differ between the human and chimp sequence

 *     *
TTCGACGAATCG human
TACCATCGAACG chimp

Substitution rate would be 16.7 % (2/12).

2
Entering edit mode

Your example is flawed, and you have to apply the Jukes-Cantor correction.

0
Entering edit mode

Yep, not my best work, thanks...