I am new to bio informatics. Currently I am trying to impute the genotypes for a specific SNP and I have managed impute using a phased reference panel (haplotypes generated from SHAPEIT were downloaded form the impute2 reference dataset). I have genotype data analyzed using Taqman assay in about two-third of the subjects. There is good agreement between the imputed genotypes and the experimental data. However, impute2 doesnot reliably predict gonotypes in 15% of the population. So I would want to explore more exhaustive methods.
1) Use of pre-phased haplotype files is supposed to result in the loss of some efficacy. A more accurate prediction can be obtained (at the cost of more computing time) by using unphased reference data ( Using -g_ref option within impute2). However, I dont find a lot of information about this option. Where can I download the reference data (1000 gnomes) ? How to do I convert it to the genotype format that can be used with impute2 ?
2) As per the impute2 help: "This procedure is not recommended for unphased reference panels that have high SNP density, such as those that result from resequencing studies of population samples. In that situation, there may be statistical convergence issues that could decrease the imputation quality". What do they mean by "high SNP density". Is the SNP density for the 1000 genome data very high rendering it unsuitable for imputation with unphased reference panel ?