Question

Calculating the number of synonymous and non-synonymous sites for genes

0

Entering edit mode

7.1 years ago

chlamystar • 0

Hi all,

I have a list of the numbers of synonymous and non-synonymous SNPs for a large set of genes (please note - do not have the actual SNP sequences just the amounts).

I would like to now calculate the ka/ks ratio however to do so I need to calculate the numbers of synonymous and non synonymous sites in each gene. I can find a lot of information on how to calculate the ka/ks ratio but not for how to get how many sites have the potential to host either synonymous or non-synonymous substitutions.

Any advice how to do so? Especially bearing in mind I am not great at programming - I can run R but do not have access at the moment to things like matlab.

Thanks in advance for any feedback - am a first time user so happy to give more info. Daisy

SNP synonymous kaks • 3.0k views

ADD COMMENT • link updated 7.1 years ago by Petr Ponomarenko ★ 2.8k • written 7.1 years ago by chlamystar • 0

score 0 · Answer 1 · 2017-03-04

Unless you are certain about gene sequences and need very precise data, there is a workaround: assume that ratio of synonymous sites to nonsynonymous per a genomic region, say of 1000 nucleotides, is constant, then Ka/Ks is approximately your synonymous/nonsynonymous SNP data multiplied by that constant. That constant is the same between genes in first approximation and you can rank genes using synonymous/nonsynonymous SNP as if it is Ka/Ks. Now you need to "normalize" by looking at genes you have a neutral selection for sure. Instead of normalization you can test if ratio for a particular gene or set of genes is different from that "norm" that corresponds to neutral selection using chi-squared test or similar statistic. Data might need to be filtered a bit to remove outliers. This is simplified aproach, but other more precise approaches can be used too.