Hello,
I am trying to perform genetic imputation on TCGA genotype data so I can perform a GWAS-type analysis. I used the Michigan imputation server with Minimac4 to impute all chromosomes (see methods below for full description).
I noticed that after imputation, SNPs that were physically genotyped have changed/are inconsistent for many individuals. Why does this happen? Observed values should not be affected by imputation, I think. Any help is appreciated.
Thanks!
Methods: I downloaded the TCGA Affymetrix SNP6 data and reformatted it into PLINKv1.9 files. I removed SNPs with MAF < 1%, HWE < 1e-5, genotype/individual call-rate < 95%, and I flipped negative-stranded to positive strand with snpflip tool. I also removed ambiguous stranded SNPs. I also performed ancestral filtering to make sure I only have European samples. I used --recode vcf in plink to convert bed files to VCF files. For Michigan imputation server, I imputed with Phase1 v3 (no singletons) and choose the EUR population.