Question: Most suitable alignment program for Ka/Ks analysis of whole genome coding sequences
gravatar for Cytosine
6.2 years ago by
Ljubljana, Slovenia
Cytosine450 wrote:


I've been trying to get Ka/Ks scores of genome-wide pairwise alignments of coding sequences from two strains of the same organism. (i.e. very high similarity)

So far I'm using a custom python script to remove stop codons, Muscle to align the sequences and finally pipe that into KaKs Calculator. I get like 10% of CDS having the coefficient greater than 1, which I find highly unlikely and attribute that to bad alignment.

What alignment programs do you use when trying to align coding sequences (with different sizes and indels) for Ka/Ks analysis?




sequence alignment genome • 3.1k views
ADD COMMENTlink modified 6.2 years ago by Juke344.8k • written 6.2 years ago by Cytosine450
gravatar for Juke34
6.2 years ago by
Juke344.8k wrote:


Looks the paragraph "Alignment and cleaning" of this publication :

But I hope you omitted some stuff in your explanation... otherwise I don't understand how you can calculate Ka/Ks on same organism. You must have other species in order to estimate what were the ancestral state of each codon to determine the kind of event that occurred...

ADD COMMENTlink modified 6.2 years ago • written 6.2 years ago by Juke344.8k

Hey, thanks a lot for the link! The basic idea here is that one of the strains is the originator of the other strain. One strain developed from the other and I'm using the originator as the ancestor and the other one to compare to that ancestral strain. Hopefully that makes sense? 

ADD REPLYlink written 6.2 years ago by Cytosine450


Use the originator as the ancestor seems correct, but it is only if this sequence corresponds to the real ancestral state (as example you already sequenced the strain in the past). In other word if the sequence that you consider as ancestral is a contemporary sequence of your strain, this sequence has evolved independently since it gave birth to the other strain.

Often we use only contemporary sequences and the tools assess the ancestral state of the studied sequences.

About the large part of sequence with a value greater than one:

You have to be careful about the polymorphism effect. Indeed, if the divergence time between your two sequences is really low, your Kn will be higher and your result biased. Indeed synonymous and non-synonymous mutation occur randomly at the same rate. But some time is needed for the purification/selection effect plays its role. ( mutations giving less fitness or serious problems are not selected, and vice versa).

ADD REPLYlink written 6.2 years ago by Juke344.8k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2273 users visited in the last hour