We have a list of SNPs that were highly significant associations by independent GWAS. Now we are looking at re-sequencing 1) genes that were in LD with these variants and 2) regions in LD with these variants. We will enrich for regions using SureSelect to get our targets to something manageable (after identifying exons/miRNA binding sites/ESTs etc.).
My question is this: what criteria should be used to draw boundaries (say a window) around a GWAS variant--particularly those in intergenic regions? I don't like the idea of using r2 measures, since presumably other SNPs in high r2 with the common variants detected by GWAS would have probably come up. I prefer the idea of looking in an LD block defined by some measure. Since r2 is not reliably estimated for low MAF variants from HapMap samples, I thought of using D' (D prime). The issue is that some measures of D'=1 can expand hundreds of kilobases--clearly beyond where real recombination takes place. What other measures or empirical boundaries around a SNP could be used to refine the region subject to resequencing? Preferably something easy to explain in a paper. The goal of the project is to find low MAF (and presumably higher penetrance) variants in these regions not necessarily detectable by GWAS.