Question

How To Calculate The Number Of Potential Synonymous And Nonsynonynous Sites

1

Entering edit mode

11.3 years ago

bingyu19821270 ▴ 40

Hi everyone,

Does anyone know how to calculate the number of synonymous sites for a certain sequence? I know the principles, but I don't know whether there is any software or script that can be used for this. All the software I know is to calculate dN, dS, Ka, and Ks... I would really appreciate if anyone can help me out with this problem. Thank you in advance!

--Patricia

• 14k views

ADD COMMENT • link updated 7.2 years ago by a1ultima ▴ 840 • written 11.3 years ago by bingyu19821270 ▴ 40

1

Entering edit mode

I don't understand "for a certain sequence". Normally people calculate the number of syn. sites for a set of SNPs. What you would have to do is align that sequence to a reference and compute the variants.

ADD REPLY • link 11.3 years ago by Gabriel R. ★ 2.9k

0

Entering edit mode

To be clear - do you meant you are looking for way to locate 4-fold degenerate sites (i.e. sites for which all mutations will be synonymous) in a single sequence?

ADD REPLY • link 11.3 years ago by David W 4.9k

1

Entering edit mode

I think, rather, that Patricia is using one of the sometimes-called "approximate" or "counting" methods of looking for selection, in which an attempt is made to count the number of synonymous sites by looking at sequences and identifying positions that are 4-fold degenerate (which might count as a single synonymous site), then maybe half a synonymous site for 2-fold degenerate sites, and so on.

Or at least that's how I understand these methods. The references I give in my answer below cover these issues.

The fact that one is (always? almost always?) interested in synonymous (or non-synonymous) changes between two (or more) sequences, highlights that a method that compares 2+ sequences to address such questions is likely to be a good way to go.

In addition, the fact that software such as PAML (generally acknowledged as providing good ways of estimating such things) doesn't provide this kind of information (or at least as far as I can tell, after looking at this a bit just now), further highlights that estimating these kinds of things is unlikely to be something of high interest.

Patricia, would be great to get some feedback on whether these answers/comments are useful for your question.

ADD REPLY • link 11.3 years ago by aidan-budd 1.9k

1

Entering edit mode

Hi aidan-budd, Thank you very much for taking the trouble to help me. I really appreciate it. You are right. I'm looking for a software to count the number of potential synonymous or nonsynonymous sites of a sequence, by identifying 4-fold degenerate, 2-fold degenerate sites (Nei-Gojobori Method). Like the others say, these information is often offered when the input is sequence alignment files. The problem is that I don't wanna estimate dN, dS, and things alike. So there is no need to generate sequence alignments. I saw some people get this kind of information by writing programs, like Perl scripts, which I am not good at :(

ADD REPLY • link 11.3 years ago by bingyu19821270 ▴ 40

score 1 · Answer 1 · 2013-01-15

Not something I have much experience of doing myself, but I can point you in the direction of a good paper to read through on this topic, by two authors who really do know a lot about the topic :)

Estimating Synonymous and Nonsynonymous Substitution Rates Under Realistic Evolutionary Models, Yang and Nielsen, MBE 1999

This discusses some of the methods for exploring these kinds of issues that depend on counting sites, along with references to other relevant articles and discussions.

Ziheng also covers these topics in his book "Computational Molecular Evolution" (which I find a great place to begin for non-mathematics-minded people [such as myself] who want to grapple with the maths) in the second chapter (section 2.5, looking at counting methods).

It looks like the MEGA software might do this for you i.e. give you estimates of the number of sites, but if I were you, based on my faint memories of learning about these topics, I'd consider looking into doing your analyses using ML, as implemented, for example, in the codeml program within the PAML package

score 1 · Answer 2 · 2013-01-15

1

Entering edit mode

11.3 years ago

aindap ▴ 120

Hi there,

Have a look at this: Nei-Gojobori Method. MEGA can get you what you want, but take a look at Nei & Kumar's textbook "Molecular Evolution and Phylogenetics.", Chapter 4.

ADD COMMENT • link 11.3 years ago by aindap ▴ 120

score 0 · Answer 3 · 2013-01-15

0

Entering edit mode

11.3 years ago

qiyunzhu ▴ 430

Try DataMonkey: http://www.datamonkey.org/. It is an web-based collection and interface of tools. The kernel is HyPhy. As a start you can try MEME. You will get a graphic output, including subsitutions by site.

ADD COMMENT • link 11.3 years ago by qiyunzhu ▴ 430

score 0 · Answer 4 · 2017-02-01

How about this?

But if you would prefer to see code (instead of math), then have a look at changes.py (Python code). But perhaps my documentation was not clear, so just ask me for clarification.

Or have a look at some of the links mentioned in a related Biostars post.

If none of the above will work, then I will try find the book I used to help code my own dnds.py script, and get back to you with an ISBN.