I need an advice concerning the reconstruction of haplotypes from the genotypes on SNPs for different sets of individuals. Here is the situation:
I am using fastPHASE. I have 2 different levels of analysis: - first level, the global level: looking at a set of 940 individuals - second level, the regional level: looking at subsets of these 940 individuals. Then I have 100 individuals for Africa, 64 for AMerica and so on.
I have filtered the SNPs for MAF>=0.05 and known genotype for >= 90% of the individuals for each region and for the global level (giving different subsets of SNPs). So I am wondering if I have to run fastPHASE for each region or if, for each region I can extract from the phased data obtained for the Global, the haplotypes for my sub sets of SNPs and individuals. This is possible because intrinsically at the Global level the subset of SNPs does contain all the SNPs from each subset for each population.
Since fastPHASE is very time-demanding, extracting from the phased data obtained for the Global level will allow me to save A LOT OF TIME: I wouldn't run fastPHASE for the 7 regions. On the other hand I guess that fastPHASE do not run the same way if we have intermediate SNPs (extracting form phased haplotypes obtained at the Global level) and if not (running fastPHASE for each region). How important will you expect the difference to be?
I do not know if it is clear enough and if you have a defenitive answer for this. Anyway thanks for your help!