Hi all,
I want to construct a workflow that will allow me to impute small batches of SNPs that were untyped in my GWAS data (in .bed .bim .fam fomat).
Is it possible to meaningfully subset my reference panel (hapmap release 23) so that I only impute the ~8000 missing SNPs? Would it be erroneous to start by subsetting the reference by tag SNPs to those 8000 SNPs, then merging the reference and GWAS data, then imputing genotypes?
Thanks for the help in advance!
Thanks for this insight! As subtext, are you indicating that whole GWAS imputation is the way to go; or can I still pre-filter my reference set to expedite the imputation turnaround?
Impute2 should support this sort of variant filtering: see http://mathgen.stats.ox.ac.uk/impute/impute_v2.html#ex4 .