Dn/Ds Ratio - Paml : How Is It Possible To Obtain Negative Values For Dn Or Ds ?
1
1
Entering edit mode
11.9 years ago

Hi,

I have a set of 383 coding sequences in which there is a whole bunch of SNPs. I only have two species (2 populations representing incipient species). I used PAML to estimate the dn/ds ratio for each sequence. For some of the sequences, I get negative values either for dn, ds or dn/ds.

I know from the FAQs PDF file that comes with PAML's latest distribution that -1.000 means infinity. So if you have a dn > 0 and a ds = 0, when you divide dn by ds, you get -1.0000, i.e infinity. But when I get a result such as :

Nei & Gojobori 1986. dN/dS (dN, dS)
(Note: This matrix is not used in later ML. analysis.
Use runmode = -2 for ML pairwise comparison.)

allele1             
allele2             -0.6057 (-1.0000 1.6509)

How is it possible to have an infinite value for dn OR ds ? dn = -1.0000 or ds = -1.0000 seems really strange to me. How should I interpret these negative values ? My guess would be that it means that there is no synonymous or non-synonymous sites in the sequence. Thus, PAML divides 0 by 0 and it returns -1.0000. Example: if there is 0 synonymous mutations and 0 synonymous sites, the ratio of synonymous mutation / synonymous sites equals infinity (-1.0000) because it's a division by 0.

Am I wrong ? If not, should I simply replace the value -1.0000 by 0 each time dn or ds equals -1.0000 ?

Thank you for any help !

paml selection • 7.6k views
ADD COMMENT
0
Entering edit mode

that's a really strange error! I'm sorry I have no clue to solve your problem. your explanation seems logically correct but that would mean that your sequences have 0 non-synonymous sites and how could it be possible?

ADD REPLY
1
Entering edit mode
11.8 years ago
Liam Thompson ▴ 140

From my understanding of the output of the program, which is limited, an infinity values means means that there are one or more non-synonymous substitutions and no synonymous substitutions in the branch. Obviously an infinity value makes the data difficult to analyse and thus an LRT should ideally be performed (e.g. using the lnL between branched site Model A and site Model 1, and branched site Model B and site Model 3). I hope this sheds some light.

ADD COMMENT

Login before adding your answer.

Traffic: 1883 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6