Question: How to decide method for imputing the missing genotype in a VCF
0
gravatar for nchuang
16 months ago by
nchuang250
United States
nchuang250 wrote:

Trying to understand the various ways to perform imputation with the available tools published in the field. My understanding is that imputation is frequently used to help fill in missing data when working with SNP arrays. However, what do you do if you have a large variant call file from whole genome sequence data with about less than 1% missing calls (still 1000s) after filtering for quality and genomes with high missing rates. I also removed calls that were 40-77% missing in my cohort. I wish to have no missing genotypes because I want to try clustering on these calls and some metrics cannot have missing values.

It looks like the simplest way to perform the imputation (short of just using mean, mode, etc) would be to use Beagle as it doesn't require a reference map. SHAPEIT/IMPUTE2 looks to be the best option when phased reference panels are used according to a recent comparison based on SNP chips. What is the general approach when missing calls are low and the data is WGS?

ADD COMMENTlink modified 16 months ago • written 16 months ago by nchuang250
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1221 users visited in the last hour
_