How to decide method for imputing the missing genotype in a VCF
Entering edit mode
2.7 years ago
nchuang ▴ 260

Trying to understand the various ways to perform imputation with the available tools published in the field. My understanding is that imputation is frequently used to help fill in missing data when working with SNP arrays. However, what do you do if you have a large variant call file from whole genome sequence data with about less than 1% missing calls (still 1000s) after filtering for quality and genomes with high missing rates. I also removed calls that were 40-77% missing in my cohort. I wish to have no missing genotypes because I want to try clustering on these calls and some metrics cannot have missing values.

It looks like the simplest way to perform the imputation (short of just using mean, mode, etc) would be to use Beagle as it doesn't require a reference map. SHAPEIT/IMPUTE2 looks to be the best option when phased reference panels are used according to a recent comparison based on SNP chips. What is the general approach when missing calls are low and the data is WGS?

imputation impute2 shapeit beagle • 924 views

Login before adding your answer.

Traffic: 1873 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6