Hello people,

I have to calculate KaKs ratio. Here is what I have: 1. Assembled transcripts (using trinity) 2. 100 genes sequences of a specific gene family 3. 80 protein sequences from the before mentioned genes.

Q1: How do I make the dataset for phylogenetic tree ? Should I mix the 1. (of course transcripts that mapped to sequences in 1. ) and 2. OR should I take only the sequences from 1. that mapped to sequences in 2.

Q2: How do I calculate KaKs using my transcript sequences(1.) and reference protein sequence(3.) ?

I have studied the PAML-PAL2NAL and MEGA5 pipeline and they perform multiple sequence alignment between same type of sequences(i.e. either mrna or proteins), which is where my case differs !!! Should I convert the selected transcripts from 1. to protein then perform msa ????

Any suggestion is highly valued and Thanks in advance

1) Convert all your transcripts to protein

2) Do a multiple sequence alignment of your protein sequences

3) Use PAL2NAL to convert your protein alignment to a codon alignment using the original transcripts

4) Use you codon alignment to produce a maximum likelihood tree

5) Use your codon alignment and your ML tree in PAML to obtain the KaKs ratio

Keep in mind that the better your sequence sampling, the more robust the results will be.

@Joseph Hughes..Thank you sir...clear instruction, very helpful !!!


