Reading frame after aligning DNA for codeml analysis

0

Entering edit mode

8 months ago

sodiumnitrate ▴ 20

I have a bunch of protein-coding DNA sequences in the correct reading frame, such that upon translation, I get the protein sequences.

If I align in the corresponding amino acid sequences, each gap is a codon, but if I align in the DNA sequences, it can add gaps that are not multiples of 3s, causing frame shifts.

With these DNA alignments, I can build decent trees, and proceed with looking for positive selection with site models in codeml.

I then end up having sites like this:

  16 *   0.99944 0.00050 0.00006 ( 1)  0.051 +-  0.028

where * is the codon that's broken during the DNA alignment, in the first sequence.

Is the result of this analysis then wrong? Should I be aligning in amino acid space, converting back to DNA (with the correct codons in the DNA), and doing all the subsequent analyses based on that?

Further, how are gaps (-) or gibberish codons (*) treated in PAML? Are they ignored?

codeml reading-frame dna alignment • 279 views

ADD COMMENT • link updated 8 months ago by Ram 43k • written 8 months ago by sodiumnitrate ▴ 20

Login before adding your answer.