Question

CodeML: Branch Specific Substitution Rates?

2

Entering edit mode

8.0 years ago

smrupp16 ▴ 20

It it possible to get an omega estimate for each branch in CodeML? Specifically, I am trying to determine the substitution rate between species that has occurred since they diverged at the previous branch. I have managed to get site-specific output, but that is bit to specific for me. I have tried pretty much every combination of options the control file that I can think of.

Here's the tree I've been using:

(Human, (#1 green_ano, #2 Chicken))

I have been trying to find a solution for this for quite a while, but I have yet to find anything. Thank you very much for any help.

alignment sequence • 2.9k views

ADD COMMENT • link 8.0 years ago by smrupp16 ▴ 20

0

Entering edit mode

Hello smrupp16!

We believe that this post does not fit the main topic of this site.

Problem has been addressed

For this reason we have closed your question. This allows us to keep the site focused on the topics that the community can help with.

If you disagree please tell us why in a reply below, we'll be happy to talk about it.

Cheers!

ADD REPLY • link 8.0 years ago by smrupp16 ▴ 20

0

Entering edit mode

Don't close this post yourself. Instead accept the answer you posted below (use tick mark, and any other answers, if applicable). That would be the best way to indicate that the problem has found an answer.

ADD REPLY • link 8.0 years ago by GenoMax 141k

0

Entering edit mode

I re-opened the post. It was probably closed because OP thought that was how to mark something solved, though the "close post" option is for something entirely different (mainly off-topic posts).

ADD REPLY • link 8.0 years ago by SES 8.6k

score 1 · Answer 1 · 2016-04-15

1

Entering edit mode

8.0 years ago

SES 8.6k

PAML issues are tough to debug but it does look like your tree is not formatted correctly. This is explained in the documentation and there are example trees under the 'examples' directory of the distribution. Model selection is also important. You need to set model = 1 in the control file to fit the free-ratio models, which assumes an independent omega for each branch, whereas model = 2 allows branch groups to have several omega values.

edit: To be specific, I would try putting the labels after the IDs.

ADD COMMENT • link 8.0 years ago by SES 8.6k

0

Entering edit mode

I'm not sure that made much of a difference, but I think it got me pointed in the right direction.

ADD REPLY • link 8.0 years ago by smrupp16 ▴ 20

0

Entering edit mode

What model are you using and how does your tree look now? PAML if very picky about formats so it helps a lot to see the data you are using.

ADD REPLY • link 8.0 years ago by SES 8.6k

0

Entering edit mode

Sorry for the late reply. I got caught up in revising a paper.

Here's my tree now: (Human, (green_ano #1, Chicken #2))

I tried all three models, but I have been using 1 since you made the suggestion. I have been able to get some other results, but not quite what I'm looking for.

I have been using BioPython to convert fasta alignment files into sequential phylip formatted files:

3 1455 Human --GGCCTCCGGGAG... green_ano ATGGCTGCTGCTT... Chicken ATGGCTGCTG...

ADD REPLY • link 8.0 years ago by smrupp16 ▴ 20

0

Entering edit mode

Those are on different lines in the phylip file. I'm not sure why they're bunched up here.

ADD REPLY • link 8.0 years ago by smrupp16 ▴ 20

score 0 · Answer 2 · 2016-05-03

I finally, and somewhat accidentally, managed to solve the problem. I'm not sure exactly what solved the problem, so here is a summary of my input:

Tree file: (Human, (Green_ano #1, Chicken #2))

Control file (I have a script that adds the file names): seqfile = * sequence data filename treefile = * tree structure file name outfile = * main result file name

    noisy = 3  * 0,1,2,3,9: how much rubbish on the screen
  verbose = 0  * 0: concise; 1: detailed, 2: too much
  runmode = 0  * 0: user tree;  1: semi-automatic;  2: automatic
               * 3: StepwiseAddition; (4,5):PerturbationNNI; -2: pairwise

  seqtype = 1  * 1:codons; 2:AAs; 3:codons-->AAs
CodonFreq = 0  * 0:1/61 each, 1:F1X4, 2:F3X4, 3:codon table
    clock = 0  * 0:no clock, 1:clock; 2:local clock; 3:CombinedAnalysis
   aaDist = 0  * 0:equal, +:geometric; -:linear, 1-6:G1974,Miyata,c,p,v,a
 aaRatefile = dat/jones.dat  * only used for aa seqs with model=empirical(_F)
               * dayhoff.dat, jones.dat, wag.dat, mtmam.dat, or your own
    model = 1
               * models for codons:
                   * 0:one, 1:b, 2:2 or more dN/dS ratios for branches

  NSsites = 0  * 0:one w;1:neutral;2:selection; 3:discrete;4:freqs;
               * 5:gamma;6:2gamma;7:beta;8:beta&w;9:beta&gamma;
               * 10:beta&gamma+1; 11:beta&normal>1; 12:0&2normal>1;
               * 13:3normal>0

    icode = 0  * 0:universal code; 1:mammalian mt; 2-10:see below
fix_kappa = 0  * 1: kappa fixed, 0: kappa to be estimated
    kappa = 3  * initial or fixed kappa
fix_omega = 0  * 1: omega or omega_1 fixed, 0: estimate 
    omega = 1 * initial or fixed omega, for codons or codon-based AAs

fix_alpha = 1  * 0: estimate gamma shape parameter; 1: fix it at alpha
    alpha = 0. * initial or fixed alpha, 0:infinity (constant rate)
   Malpha = 0  * different alphas for genes
    ncatG = 8  * # of categories in dG of NSsites models

    getSE = 0  * 0: don't want them, 1: want S.E.s of estimates
  RateAncestor = 0  * (0,1,2): rates (alpha>0) or ancestral states (1 or 2)

 Small_Diff = .5e-6
  cleandata = 1  * remove sites with ambiguity data (1:yes, 0:no)?
   method = 1  * Optimization method 0: simultaneous; 1: one branch a time

My input files are in sequential phylip format.