My question is this: are any papers or other data available which cite the degree to which low frequency variants can be imputed based on genotype data from common arrays and 1000 Genomes pilot 3 data? We need some hard estimates on how much it may be expected to improve over our Hapmap 2 levels. Particularly for rare coding SNPs.
hapmap2, even hapmap3 relates to dbSNP126, hence the density of variants covered by current dbSNP132 is far from being one fifth of the existing total. although we haven't done any work on 1KG pilot 3 data, we have done some on pilot 1 compared to hapmap3 which in fact corroborates what the 1KG recent paper says: they have covered well beyond the 1% MAF definition for a SNP. this coverage increase has been loaded into dbSNP's builds 130, 131 and 132, and for that reason you should consider browsing it as you reference repository if you want to estimate the "imputable variants" on a region of interest.
of course if you need population frequency values you will probably find very well developed numbers on hapmap, but I guess that if you only need to build estimates of rare variants that you would find while testing your arrays dbSNP should do.
PS: we are currently being reviewed by BMC Bioinformatics for this job on pilot 1 and population statistics on rare variants, but if you want to discuss anything in particular with us please feel free to do so.