Question

Most suitable alignment program for Ka/Ks analysis of whole genome coding sequences

0

Entering edit mode

9.8 years ago

Cytosine ▴ 460

Hi,

I've been trying to get Ka/Ks scores of genome-wide pairwise alignments of coding sequences from two strains of the same organism. (i.e. very high similarity)

So far I'm using a custom python script to remove stop codons, Muscle to align the sequences and finally pipe that into KaKs Calculator. I get like 10% of CDS having the coefficient greater than 1, which I find highly unlikely and attribute that to bad alignment.

What alignment programs do you use when trying to align coding sequences (with different sizes and indels) for Ka/Ks analysis?

genome sequence alignment • 3.9k views

ADD COMMENT • link updated 2.4 years ago by Ram 43k • written 9.8 years ago by Cytosine ▴ 460

Ram · Answer 1 · 2014-07-28

1

Entering edit mode

9.8 years ago

Juke34 8.5k

Hi,

Looks the paragraph "Alignment and cleaning" of this publication.

But I hope you omitted some stuff in your explanation... otherwise I don't understand how you can calculate Ka/Ks on same organism. You must have other species in order to estimate what were the ancestral state of each codon to determine the kind of event that occurred...

ADD COMMENT • link updated 2.4 years ago by Ram 43k • written 9.8 years ago by Juke34 8.5k

0

Entering edit mode

Hey, thanks a lot for the link! The basic idea here is that one of the strains is the originator of the other strain. One strain developed from the other and I'm using the originator as the ancestor and the other one to compare to that ancestral strain. Hopefully that makes sense?

ADD REPLY • link updated 2.4 years ago by Ram 43k • written 9.8 years ago by Cytosine ▴ 460

0

Entering edit mode

Hey,

Use the originator as the ancestor seems correct, but it is only if this sequence corresponds to the real ancestral state (as example you already sequenced the strain in the past). In other words if the sequence that you consider as ancestral is a contemporary sequence of your strain, this sequence has evolved independently since it gave birth to the other strain.

Often we use only contemporary sequences and the tools assess the ancestral state of the studied sequences.

About the large part of sequence with a value greater than one:

You have to be careful about the polymorphism effect. Indeed, if the divergence time between your two sequences is really low, your Kn will be higher and your result biased. Indeed synonymous and non-synonymous mutation occur randomly at the same rate. But some time is needed for the purification/selection effect plays its role. ( mutations giving less fitness or serious problems are not selected, and vice versa).

ADD REPLY • link updated 2.4 years ago by Ram 43k • written 9.8 years ago by Juke34 8.5k