Hi all,
The 1000 genomes paper indicates that there are about 10,000-11,000 non-synonymous sites and 10,000-12,000 synonymous sites per individual. I am trying to get a distribution of the variation in these numbers so I can do some statistics. For example, I would like to know if a particular region of interest is significantly enriched in nsSNPs compared to the average position in the genome. Is there a quick way to sample random segments of the exome and count sSNP and nsSNP rates?
Thanks, Joel