Hello everyone, how are you?
I’ve recently started using PyPGx, and I still have some beginner-level questions. I’m facing difficulties with genotyping data imputation: not all missing SNPs in my dataset are being imputed to compose the haplotypes—only some of them. I’m using Beagle for the imputation, as suggested, but I still haven’t been able to recover all the necessary SNPs. I suspect this might be related to the reference population I’m working with.
I’d really appreciate some guidance on the best path to follow. I’ve read quite a lot on the topic, but I can’t seem to move forward. Any advice would be very helpful!
Thank you for your attention. What is the best reference VCF panel to use with Beagle in order to ensure the largest possible number of SNPs for imputation? Should I start with the 1000 Genomes Project, or would it be better to use a denser panel like HRC or TOPMed?
The denser the panel, the better. TOPMed provides maximum density and the best coverage of rare variants. It is the most powerful option, but also the most resource-intensive. HRC is optimal for samples of European origin: high accuracy at a lower cost, but you risk missing variants specific to other populations. 1kGP is now considered obsolete and too sparse for accurate haplotype determination. Begin with TOPMed.