Question: How To Calculate The Number Of Potential Synonymous And Nonsynonynous Sites
gravatar for bingyu19821270
7.8 years ago by
bingyu1982127040 wrote:

Hi everyone,

Does anyone know how to calculate the number of synonymous sites for a certain sequence? I know the principles, but I don't know whether there is any software or script that can be used for this. All the software I know is to calculate dN, dS, Ka, and Ks... I would really appreciate if anyone can help me out with this problem. Thank you in advance!


ADD COMMENTlink modified 3.7 years ago by a1ultima750 • written 7.8 years ago by bingyu1982127040

I don't understand "for a certain sequence". Normally people calculate the number of syn. sites for a set of SNPs. What you would have to do is align that sequence to a reference and compute the variants.

ADD REPLYlink written 7.8 years ago by Gabriel R.2.8k

To be clear - do you meant you are looking for way to locate 4-fold degenerate sites (i.e. sites for which all mutations will be synonymous) in a single sequence?

ADD REPLYlink written 7.8 years ago by David W4.8k

I think, rather, that Patricia is using one of the sometimes-called "approximate" or "counting" methods of looking for selection, in which an attempt is made to count the number of synonymous sites by looking at sequences and identifying positions that are 4-fold degenerate (which might count as a single synonymous site), then maybe half a synonymous site for 2-fold degenerate sites, and so on.

Or at least that's how I understand these methods. The references I give in my answer below cover these issues.

The fact that one is (always? almost always?) interested in synonymous (or non-synonymous) changes between two (or more) sequences, highlights that a method that compares 2+ sequences to address such questions is likely to be a good way to go.

In addition, the fact that software such as PAML (generally acknowledged as providing good ways of estimating such things) doesn't provide this kind of information (or at least as far as I can tell, after looking at this a bit just now), further highlights that estimating these kinds of things is unlikely to be something of high interest.

Patricia, would be great to get some feedback on whether these answers/comments are useful for your question.

ADD REPLYlink written 7.8 years ago by aidan-budd1.9k

Hi aidan-budd, Thank you very much for taking the trouble to help me. I really appreciate it. You are right. I'm looking for a software to count the number of potential synonymous or nonsynonymous sites of a sequence, by identifying 4-fold degenerate, 2-fold degenerate sites (Nei-Gojobori Method). Like the others say, these information is often offered when the input is sequence alignment files. The problem is that I don't wanna estimate dN, dS, and things alike. So there is no need to generate sequence alignments. I saw some people get this kind of information by writing programs, like Perl scripts, which I am not good at :(

ADD REPLYlink written 7.8 years ago by bingyu1982127040
gravatar for aidan-budd
7.8 years ago by
aidan-budd1.9k wrote:

Not something I have much experience of doing myself, but I can point you in the direction of a good paper to read through on this topic, by two authors who really do know a lot about the topic :)

Estimating Synonymous and Nonsynonymous Substitution Rates Under Realistic Evolutionary Models, Yang and Nielsen, MBE 1999

This discusses some of the methods for exploring these kinds of issues that depend on counting sites, along with references to other relevant articles and discussions.

Ziheng also covers these topics in his book "Computational Molecular Evolution" (which I find a great place to begin for non-mathematics-minded people [such as myself] who want to grapple with the maths) in the second chapter (section 2.5, looking at counting methods).

It looks like the MEGA software might do this for you i.e. give you estimates of the number of sites, but if I were you, based on my faint memories of learning about these topics, I'd consider looking into doing your analyses using ML, as implemented, for example, in the codeml program within the PAML package

ADD COMMENTlink written 7.8 years ago by aidan-budd1.9k
gravatar for aindap
7.8 years ago by
United States
aindap120 wrote:

Hi there,

Have a look at this: Nei-Gojobori Method. MEGA can get you what you want, but take a look at Nei & Kumar's textbook "Molecular Evolution and Phylogenetics.", Chapter 4.

ADD COMMENTlink written 7.8 years ago by aindap120
gravatar for qiyunzhu
7.8 years ago by
qiyunzhu430 wrote:

Try DataMonkey: It is an web-based collection and interface of tools. The kernel is HyPhy. As a start you can try MEME. You will get a graphic output, including subsitutions by site.

ADD COMMENTlink written 7.8 years ago by qiyunzhu430
gravatar for a1ultima
3.7 years ago by
a1ultima750 wrote:

How about this?

But if you would prefer to see code (instead of math), then have a look at (Python code). But perhaps my documentation was not clear, so just ask me for clarification.

Or have a look at some of the links mentioned in a related Biostars post.

If none of the above will work, then I will try find the book I used to help code my own script, and get back to you with an ISBN.

ADD COMMENTlink modified 3.7 years ago • written 3.7 years ago by a1ultima750
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1604 users visited in the last hour