Question

dN/dS output trouble in PAML

0

Entering edit mode

9.5 years ago

munna.uppal ▴ 10

Hi all,

I'm having an issue with the output of my Codeml ctl file. I'm attempting to calculate a single dN/dS (or, omega) value for an entire alignment of 11 sequences (I know there have been posts in the past asking for similar help, but I haven't been able to find a satisfactory solution).

However, Codeml isn't outputting a single summary statistic, and instead gives me a sorta lower triangular matrix of values. I'm not sure what these values exactly represent but I imagine they're dN/dS values of some sort.

The relevant input:

runmode = 0, seqtype = 1, CodonFreq = 2, model = 0, NSsites = 0
fix_kappa = 0, kappa = 1, fix_omega = 0, omega = 1, ncatG = 10

The relevant output (a snapshot):

Updated_Gene_HN_CI_7_D1
Updated_Gene_HN_CI_8_D1 0.0278 (0.0046 0.1646)
Updated_Gene_HN_CI_9_D1 0.0173 (0.0008 0.0440) 0.0320 (0.0046 0.1430)
Updated_Gene_HN_CI_10_D1 0.0734 (0.0030 0.0414) 0.0328 (0.0050 0.1510) 0.0640 (0.0034 0.0535)
Updated_Gene_HN_CI_11_D1 0.0291 (0.0030 0.1047) 0.0263 (0.0042 0.1594) 0.0317 (0.0030 0.0961) 0.0491 (0.0034 0.0697)

From a helpful resource with sample codeml.ctl code (http://www.researchgate.net/post/How_to_interpret_codeml_output_from_PAML_Phylogenetic_Analysis_by_Maximum_Likelihood), I've been told that I should be getting something like the following:

Detailed output identifying parameters
kappa (ts/tv) = 1.00000
omega (dN/dS) = 1.00000

But this is nowhere to be found in my output file. So my question is, what's going on? What are the values that I'm seeing currently? And why is my .ctl output not giving summary dN/dS or kappa values? Is there perhaps something wrong with my input or am I just missing something?

Any and all help is appreciated.

sequencing sequence gene • 6.3k views

ADD COMMENT • link updated 2.2 years ago by Ram 43k • written 9.5 years ago by munna.uppal ▴ 10

Ram · Answer 1 · 2014-11-12

I've had this happen before in my analyses. This is often the result of an unflagged error in your input file - codeml generates a pairwise matrix, one of the initial steps, but doesn't proceed to the ML estimation of omega. I had this happen a while back when my sample names did not conform exactly to Phylip specifications and again when my samples started with a number. My first suggestion would be to shorten the names to 10 characters or less and see what happens. This is easily accomplished with sed:

cat YOURFILE.fa | sed -e s/Updated_Gene_HN//g > YOURFILE.renamed.fa

This reads the file, finds every instance of "Updated_Gene_HN" and replaces it with nothing, globally, then writes the result to a new file. This can be iterated easily in bash, too, if you need it.

I also needed to do some single-rate (i.e., global) omega calculations a couple of weeks ago. I'm pasting my control file below, though it looks like yours is fine. This file executed over 15,000 times, so I'm confident it works :). I've changed the names of the seqfile and outfile; I also provided my own tree.

seqfile = FASTAFILENAME * sequence data filename
     treefile = tree.phy  * tree structure file name
      outfile = FASTAFILENAME.codeml          * main result file name

        noisy = 9  * 0,1,2,3,9: how much rubbish on the screen
      verbose = 1  * 0: concise; 1: detailed, 2: too much
      runmode = 0  * 0: user tree;  1: semi-automatic;  2: automatic
                   * 3: StepwiseAddition; (4,5):PerturbationNNI; -2: pairwise

      seqtype = 1  * 1:codons; 2:AAs; 3:codons-->AAs
    CodonFreq = 2  * 0:1/61 each, 1:F1X4, 2:F3X4, 3:codon table
*        ndata = 1
        clock = 0  * 0:no clock, 1:clock; 2:local clock; 3:CombinedAnalysis
       aaDist = 0  * 0:equal, +:geometric; -:linear, 1-6:G1974,Miyata,c,p,v,a
   aaRatefile = dat/jones.dat  * only used for aa seqs with model=empirical(_F)
                   * dayhoff.dat, jones.dat, wag.dat, mtmam.dat, or your own

        model = 0
                   * models for codons:
                       * 0:one, 1:b, 2:2 or more dN/dS ratios for branches
                   * models for AAs or codon-translated AAs:
                       * 0:poisson, 1:proportional, 2:Empirical, 3:Empirical+F
                       * 6:FromCodon, 7:AAClasses, 8:REVaa_0, 9:REVaa(nr=189)

      NSsites = 0  * 0:one w;1:neutral;2:selection; 3:discrete;4:freqs;
                   * 5:gamma;6:2gamma;7:beta;8:beta&w;9:beta&gamma;
                   * 10:beta&gamma+1; 11:beta&normal>1; 12:0&2normal>1;
                   * 13:3normal>0

        icode = 0  * 0:universal code; 1:mammalian mt; 2-10:see below
        Mgene = 0
                   * codon: 0:rates, 1:separate; 2:diff pi, 3:diff kapa, 4:all diff
                   * AA: 0:rates, 1:separate

    fix_kappa = 0  * 1: kappa fixed, 0: kappa to be estimated
        kappa = 2  * initial or fixed kappa
    fix_omega = 0  * 1: omega or omega_1 fixed, 0: estimate
        omega = .5 * initial or fixed omega, for codons or codon-based AAs

    fix_alpha = 1  * 0: estimate gamma shape parameter; 1: fix it at alpha
        alpha = 0 * initial or fixed alpha, 0:infinity (constant rate)
       Malpha = 0  * different alphas for genes
        ncatG = 8  * # of categories in dG of NSsites models

        getSE = 1  * 0: don't want them, 1: want S.E.s of estimates
 RateAncestor = 0  * (0,1,2): rates (alpha>0) or ancestral states (1 or 2)

   Small_Diff = .5e-6
    cleandata = 0  * remove sites with ambiguity data (1:yes, 0:no)?
*  fix_blength = -1  * 0: ignore, -1: random, 1: initial, 2: fixed
        method = 0   * 0: simultaneous; 1: one branch at a time

* Genetic codes: 0:universal, 1:mammalian mt., 2:yeast mt., 3:mold mt.,
* 4: invertebrate mt., 5: ciliate nuclear, 6: echinoderm mt.,
* 7: euplotid mt., 8: alternative yeast nu. 9: ascidian mt.,
* 10: blepharisma nu.
* These codes correspond to transl_table 1 to 11 of GENEBANK.

I hope this resolves your issues. Please respond if you need more help - I use codeml all the time.