Question: dN/dS output trouble in PAML
0
gravatar for munna.uppal
6.0 years ago by
munna.uppal10
United States
munna.uppal10 wrote:

Hi all,

I'm having an issue with the output of my Codeml ctl file. I'm attempting to calculate a single dN/dS (or, omega) value for an entire alignment of 11 sequences (I know there have been posts in the past asking for similar help, but I haven't been able to find a satisfactory solution). However, Codeml isn't outputting a single summary statistic, and instead gives me a sorta lower triangular matrix of values. I'm not sure what these values exactly represent but I imagine they're dN/dS values of some sort. 

----

-The relevant input:

runmode = 0, seqtype = 1, CodonFreq = 2, model = 0, NSsites = 0

fix_kappa = 0, kappa = 1, fix_omega = 0, omega = 1, ncatG = 10

-The relevant output (a snapshot):

Updated_Gene_HN_CI_7_D1
Updated_Gene_HN_CI_8_D1 0.0278 (0.0046 0.1646)
Updated_Gene_HN_CI_9_D1 0.0173 (0.0008 0.0440) 0.0320 (0.0046 0.1430)
Updated_Gene_HN_CI_10_D1 0.0734 (0.0030 0.0414) 0.0328 (0.0050 0.1510) 0.0640 (0.0034 0.0535)
Updated_Gene_HN_CI_11_D1 0.0291 (0.0030 0.1047) 0.0263 (0.0042 0.1594) 0.0317 (0.0030 0.0961) 0.0491 (0.0034 0.0697)

----

From a helpful resource with sample codeml.ctl code (http://www.researchgate.net/post/How_to_interpret_codeml_output_from_PAML_Phylogenetic_Analysis_by_Maximum_Likelihood), I've been told that I should be getting something like the following:

"Detailed output identifying parameters
kappa (ts/tv) = 1.00000
omega (dN/dS) = 1.00000"

But this is nowhere to be found in my output file. So my question is, what's going on? What are the values that I'm seeing currently? And why is my .ctl output not giving summary dN/dS or kappa values? Is there perhaps something wrong with my input or am I just missing something?

Any and all help is appreciated.

 

sequencing sequence gene • 4.5k views
ADD COMMENTlink modified 6.0 years ago • written 6.0 years ago by munna.uppal10
1
gravatar for Brice Sarver
6.0 years ago by
Brice Sarver3.5k
United States
Brice Sarver3.5k wrote:

I've had this happen before in my analyses. This is often the result of an unflagged error in your input file - codeml generates a pairwise matrix, one of the initial steps, but doesn't proceed to the ML estimation of omega. I had this happen a while back when my sample names did not conform exactly to Phylip specifications and again when my samples started with a number. My first suggestion would be to shorten the names to 10 characters or less and see what happens. This is easily accomplished with sed:

cat YOURFILE.fa | sed -e s/Updated_Gene_HN//g > YOURFILE.renamed.fa

This reads the file, finds every instance of "Updated_Gene_HN" and replaces it with nothing, globally, then writes the result to a new file. This can be iterated easily in bash, too, if you need it.

I also needed to do some single-rate (i.e., global) omega calculations a couple of weeks ago. I'm pasting my control file below, though it looks like yours is fine. This file executed over 15,000 times, so I'm confident it works :). I've changed the names of the seqfile and outfile; I also provided my own tree.

seqfile = FASTAFILENAME * sequence data filename
     treefile = tree.phy  * tree structure file name
      outfile = FASTAFILENAME.codeml          * main result file name

        noisy = 9  * 0,1,2,3,9: how much rubbish on the screen
      verbose = 1  * 0: concise; 1: detailed, 2: too much
      runmode = 0  * 0: user tree;  1: semi-automatic;  2: automatic
                   * 3: StepwiseAddition; (4,5):PerturbationNNI; -2: pairwise

      seqtype = 1  * 1:codons; 2:AAs; 3:codons-->AAs
    CodonFreq = 2  * 0:1/61 each, 1:F1X4, 2:F3X4, 3:codon table
*        ndata = 1
        clock = 0  * 0:no clock, 1:clock; 2:local clock; 3:CombinedAnalysis
       aaDist = 0  * 0:equal, +:geometric; -:linear, 1-6:G1974,Miyata,c,p,v,a
   aaRatefile = dat/jones.dat  * only used for aa seqs with model=empirical(_F)
                   * dayhoff.dat, jones.dat, wag.dat, mtmam.dat, or your own

        model = 0
                   * models for codons:
                       * 0:one, 1:b, 2:2 or more dN/dS ratios for branches
                   * models for AAs or codon-translated AAs:
                       * 0:poisson, 1:proportional, 2:Empirical, 3:Empirical+F
                       * 6:FromCodon, 7:AAClasses, 8:REVaa_0, 9:REVaa(nr=189)

      NSsites = 0  * 0:one w;1:neutral;2:selection; 3:discrete;4:freqs;
                   * 5:gamma;6:2gamma;7:beta;8:beta&w;9:betaγ
                   * 10:beta&gamma+1; 11:beta&normal>1; 12:0&2normal>1;
                   * 13:3normal>0

        icode = 0  * 0:universal code; 1:mammalian mt; 2-10:see below
        Mgene = 0
                   * codon: 0:rates, 1:separate; 2:diff pi, 3:diff kapa, 4:all diff
                   * AA: 0:rates, 1:separate

    fix_kappa = 0  * 1: kappa fixed, 0: kappa to be estimated
        kappa = 2  * initial or fixed kappa
    fix_omega = 0  * 1: omega or omega_1 fixed, 0: estimate
        omega = .5 * initial or fixed omega, for codons or codon-based AAs

    fix_alpha = 1  * 0: estimate gamma shape parameter; 1: fix it at alpha
        alpha = 0 * initial or fixed alpha, 0:infinity (constant rate)
       Malpha = 0  * different alphas for genes
        ncatG = 8  * # of categories in dG of NSsites models

        getSE = 1  * 0: don't want them, 1: want S.E.s of estimates
 RateAncestor = 0  * (0,1,2): rates (alpha>0) or ancestral states (1 or 2)

   Small_Diff = .5e-6
    cleandata = 0  * remove sites with ambiguity data (1:yes, 0:no)?
*  fix_blength = -1  * 0: ignore, -1: random, 1: initial, 2: fixed
        method = 0   * 0: simultaneous; 1: one branch at a time

* Genetic codes: 0:universal, 1:mammalian mt., 2:yeast mt., 3:mold mt.,
* 4: invertebrate mt., 5: ciliate nuclear, 6: echinoderm mt.,
* 7: euplotid mt., 8: alternative yeast nu. 9: ascidian mt.,
* 10: blepharisma nu.
* These codes correspond to transl_table 1 to 11 of GENEBANK.

I hope this resolves your issues. Please respond if you need more help - I use codeml all the time.

ADD COMMENTlink written 6.0 years ago by Brice Sarver3.5k

Hi Brice, thank you so much for your response. I tried implementing your code after shortening the name length for each sequence. After running the program, the Terminal tells me the following (which is what it said before changing the code and names, but I forgot to include in my original post):

"Seq #1 (CI_7_D1) is missing in the tree"

This is bizarre to me, as my tree contains the CI_7_D1 and its branch length when I open it in a text editor.

Overall, I am still getting the same output as before with no summary statistics. That being said, I'm wondering if the problem may lie with my input files. I’ve been constructing/aligning my sequences in MegAlign and exporting as simple fasta files. I’m constructing the tree in MEGA6, which only allows an export file type of .nwk. I was under the impression that a .nwk was equivalent to a .tree file. Could any of this potentially explain my issue?

ADD REPLYlink written 6.0 years ago by munna.uppal10
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1879 users visited in the last hour