Question

Dn/Ds Ratio - Paml : How Is It Possible To Obtain Negative Values For Dn Or Ds ?

1

Entering edit mode

13.1 years ago

Francois Olivier Hébert ▴ 280

Hi,

I have a set of 383 coding sequences in which there is a whole bunch of SNPs. I only have two species (2 populations representing incipient species). I used PAML to estimate the dn/ds ratio for each sequence. For some of the sequences, I get negative values either for dn, ds or dn/ds.

I know from the FAQs PDF file that comes with PAML's latest distribution that -1.000 means infinity. So if you have a dn > 0 and a ds = 0, when you divide dn by ds, you get -1.0000, i.e infinity. But when I get a result such as :

Nei & Gojobori 1986. dN/dS (dN, dS)
(Note: This matrix is not used in later ML. analysis.
Use runmode = -2 for ML pairwise comparison.)

allele1             
allele2             -0.6057 (-1.0000 1.6509)

How is it possible to have an infinite value for dn OR ds ? dn = -1.0000 or ds = -1.0000 seems really strange to me. How should I interpret these negative values ? My guess would be that it means that there is no synonymous or non-synonymous sites in the sequence. Thus, PAML divides 0 by 0 and it returns -1.0000. Example: if there is 0 synonymous mutations and 0 synonymous sites, the ratio of synonymous mutation / synonymous sites equals infinity (-1.0000) because it's a division by 0.

Am I wrong ? If not, should I simply replace the value -1.0000 by 0 each time dn or ds equals -1.0000 ?

Thank you for any help !

paml selection • 8.2k views

ADD COMMENT • link updated 13.0 years ago by Liam Thompson ▴ 140 • written 13.1 years ago by Francois Olivier Hébert ▴ 280

0

Entering edit mode

that's a really strange error! I'm sorry I have no clue to solve your problem. your explanation seems logically correct but that would mean that your sequences have 0 non-synonymous sites and how could it be possible?

ADD REPLY • link 13.1 years ago by Martombo ★ 3.2k

score 1 · Answer 1 · 2012-07-09

From my understanding of the output of the program, which is limited, an infinity values means means that there are one or more non-synonymous substitutions and no synonymous substitutions in the branch. Obviously an infinity value makes the data difficult to analyse and thus an LRT should ideally be performed (e.g. using the lnL between branched site Model A and site Model 1, and branched site Model B and site Model 3). I hope this sheds some light.