calculate the number of Non-Synomous and Synonymous SNP sites
1
4
Entering edit mode
10.0 years ago

Hi, I have a VCF containing the SNP information between the two genotypes of interest and using that VCF and i have annotated the SNPs using snpeff annotation tool. SNPeff does a good job annotating the SNPs in terms calculating the number of Non-Synonymous and Synonymous SNPs. One of the thing i am interested is the calculating the dN/dS ratio for each of the chromosome for the two genotypes. I did that, but some one recently told me that the before calculating dN/dS ratio, i should be estimating Non-Synonymous and Synonymous sites and then calculate dN/dS ratio. So i am wondering how do one go about estimating Non-Synonymous and Synonymous sites using VCF file?

Thanks

Upendra

SNP • 7.5k views
ADD COMMENT
2
Entering edit mode
10.0 years ago
David W 4.9k

The d_N/d_S ratio is the ratio of the non-syn and synonymous substitution _rates_ in a region (not the raw counts of changes). So, your colleague is right in pointing out that you need to know about the number of sites that could generate each type of change. As it turns out that turning the counts into rates is not as straightforward as you might think. You can read something like "Hurst (2002). The Ka/Ks ratio: diagnosing the form of sequence evolution. Trends in Genetics, 18:486-487" to work out which method to use and implement it

BUT are you sure d_N/d_S is going to tell you anything? The statistic was develpoped to understand protein evolution between divergent species - and doens't tell us very much about protein evolution within populations.

ADD COMMENT
0
Entering edit mode

Thanks David. I will look into the paper. When I talked to some of my colleagues here, they suggested that I can construct a genome for each of the genotype and once this is done, there are softwares for aligning and estimating the Ka/Ks ratio. I agree that the dN/dS statistic was developed to understand protein evolution between divergent species but I would like to know how this statistic varies between the two genotypes that I am interested in as these genotypes are parents of a mapping population.

ADD REPLY

Login before adding your answer.

Traffic: 1765 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6