Question: How to decide method for imputing the missing genotype in a VCF
gravatar for nchuang
12 months ago by
United States
nchuang230 wrote:

Trying to understand the various ways to perform imputation with the available tools published in the field. My understanding is that imputation is frequently used to help fill in missing data when working with SNP arrays. However, what do you do if you have a large variant call file from whole genome sequence data with about less than 1% missing calls (still 1000s) after filtering for quality and genomes with high missing rates. I also removed calls that were 40-77% missing in my cohort. I wish to have no missing genotypes because I want to try clustering on these calls and some metrics cannot have missing values.

It looks like the simplest way to perform the imputation (short of just using mean, mode, etc) would be to use Beagle as it doesn't require a reference map. SHAPEIT/IMPUTE2 looks to be the best option when phased reference panels are used according to a recent comparison based on SNP chips. What is the general approach when missing calls are low and the data is WGS?

ADD COMMENTlink modified 12 months ago • written 12 months ago by nchuang230
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1267 users visited in the last hour