Hi. I am working with the phase 3 1000 genomes vcf data (available here: http://www.internationalgenome.org/data) and need to estimate the number of synonymous and non-synonymous sites
For my analysis, if I had details of what type of site is occurring - e.g. non-synonymous change at a 0-fold site, synonymous change at 2-fold site, then I could restrict my analysis to 0-fold and 4-fold sites and just count those sites and the numbers of polymorphisms at them.
However, I do not have complete codon information. The VCF files provide the reference allele and the alternative allele but not the codon within which the allele is located (which I would need to calculate whether a site is 0-fold etc). Is there any way of obtaining this information? I know UCSC has this data, but their set of alleles seem to be incomplete when compared to the data taken directly fromm 1000genomes.
If this is not possible, I would be grateful for any other suggested methods that might work.