Help With Understanding The Output Of Codeml From The Paml Package
2
2
Entering edit mode
11.2 years ago
Joseph Hughes ★ 2.9k

I am conducting pairwise analyses in codeml with the getSE option and I get the following output:

lnL =-3838.516740
2.10195  1.24365  0.08719
SEs for parameters:
0.26108  0.18309  0.01564

t= 2.1020  S=   405.5  N=  1355.5  dN/dS= 0.0872  dN= 0.2054  dS= 2.3560
dN = 0.20539 +- 0.01576   dS = 2.35601 +- 0.37598 (by method 1)
dN = 0.20539 +- 0.01576   dS = 2.35601 +- 0.37598 (by method 2)


What is method 1 and method 2? Does the +- value correspond to the SE of dN and dS?

paml codeml selection • 9.1k views
2
Entering edit mode
11.0 years ago
Joseph Hughes ★ 2.9k

I posted the question on the UCL discussion board for PAML and got an answer from Ziheng Yang (see below).

I am conducting pairwise analyses in codeml with the getSE option and I get the following output:

 lnL =-3838.516740
2.10195 1.24365 0.08719 (=> What are these values?)


the three values here are t, kappa and w. They are duplicated below in the "formatted output". For example t = 2.1020. You can ignore this line of output and look at the output below, or look at the program documentation doc/pamlDOC.pdf.

SEs for parameters:
0.26108 0.18309 0.01564
t= 2.1020 S= 405.5 N= 1355.5 dN/dS= 0.0872
dN= 0.2054 dS= 2.3560 dN = 0.20539 +- 0.01576 dS = 2.35601 +- 0.37598 (by method 1)
dN = 0.20539 +- 0.01576 dS = 2.35601 +- 0.37598 (by method 2)


What is method 1 and method 2? Does the +- value correspond to the SE of dN and dS?"

They are two different ways (both approximate) for calculating the variance-covariance matrix. I am not sure whether they are explained somewhere, for example, in the documentation. If you are technically sophisticated, you can look at the routine VariancedSdN() in the file codeml.c, and also Appendix B: the delta technique in Yang (2006 Computational Molecular Evolution). Here are some notes in the program, which explains the basic idea.

This calculates the covariance matrix of dS & dN, using the difference approximation, from the covariance matrix of t and omega (vtw). com.kappa and com.pi are used. Sampling errors in parameters other than t and omega, such as kappa and pi[], are ignored.

JacobiSN = {{dS/dt, dS/dw}, {dN/dt,dN/dw}}

And yes, the +- values are the SEs.

ziheng yang

0
Entering edit mode

This discussion resulted in further question: But, why there is a high correlation between dS and its SEs?

0
Entering edit mode

With the following answer: Two points.

This is just intuition and not theory, but if your genes have similar lengths, I would expect large dS to have large SEs simply because large dS should tend to have large sampling errors. Things measured in kilometers tend to have larger (absolute) measurement errors than things measured in meters. It is a question of whether you expect the absolute errors (SE) or the relative errors (SE/dS) to be more similar among genes. I suspect that the truth is somewhere in the middle.

Second, the SE should not be 0 when the estimated dS is 0.

ziheng yang

1
Entering edit mode
11.2 years ago
Phis ★ 1.1k

Outputs from CodeML and other PAML programs aren't always easy to understand (ok, that's an understatement) and sometimes refer to subtle and or little documented features (if by documentation, you mean the manual). If you really want to find out, you can probably read the source code, but otherwise - unless you're in luck and someone on BioStar knows - you might want to try the PAML forum or, failing that, contact the author.

Sorry for not being able to be more helpful.