Question: Imputation on two genotyping datasets: should I do imputation separately? or merge the two datasets first?
I'm doing eQTL analysis. The genotyping data are from two sequencing centers using same type of SNPs chip. But one center genotyping has a better SNPs call rate than the other one: ~100,000 more SNPs were called. I did QC on two datasets separately. QC would also cause some SNPs variance between the two datasets, while means some SNPs will be removed in one data set but won't in the other.

Now I am stuck on the imputation step. Should I do imputation separately and combine the two imputed genotyping data sets for later eQTL? or first combine the two QCed genotyping data sets and do imputation together? I don't know much about the principles of genotyping imputation, so hope someone can help me on this. Thanks!


For this question, in case someone would have similar situation, I'd like to answer by myself. In GTEx (v6p) protocol, they use two different genotyping array: OMNI 5M for pilot phase and OMNI 2.5M for Mid-phase. They first downsized the 5M to 2.5 M portion of variants, and then did QC and imputation. But I think the other way is also feasible when you find there is only a small portion of common variants, maybe because different array platform or manufacturer. That's what I adopted. I did QC for each genotype batches and then merged them after imputation.

