Which reference panel to use with Impute2
Entering edit mode
6.6 years ago
raja.venks ▴ 20

Dear All

I am new to bio informatics. Currently I am trying to impute the genotypes for a specific SNP and I have managed impute using a phased reference panel (haplotypes generated from SHAPEIT were downloaded form the impute2 reference dataset). I have genotype data analyzed using Taqman assay in about two-third of the subjects. There is good agreement between the imputed genotypes and the experimental data. However, impute2 doesnot reliably predict gonotypes in 15% of the population. So I would want to explore more exhaustive methods. 

1) Use of pre-phased haplotype files is supposed to result in the loss of some efficacy. A more accurate prediction can be obtained (at the cost of more computing time) by using unphased reference data ( Using -g_ref option within impute2). However, I dont find a lot of information about this option. Where can I download the reference data (1000 gnomes) ? How to do I convert it to the genotype format that can be used with impute2 ?

2) As per the impute2 help: "This procedure is not recommended for unphased reference panels that have high SNP density, such as those that result from resequencing studies of population samples. In that situation, there may be statistical convergence issues that could decrease the imputation quality". What do they mean by "high SNP density". Is the SNP density for the 1000 genome data very high rendering it unsuitable for imputation with unphased reference panel ?

Thank you

impute2 1000 genome format • 3.9k views
Entering edit mode
5.3 years ago
tulsi1192 • 0

You've probably figured it out by now but just incase someone comes looking for answers...

You can download the 1000 Genomes phased reference datasets from the IMPUTE2 website and if you want the format for IMPUTE2 (.haps, .sample) it is already available, you need to click a dataset i.e. Phase 3, then scroll down on that page and download the .tgz file (1000GP_Phase3.tgz) - be aware these files are quite large in size.

Entering edit mode

What should I do if I don't have phased/unphased reference panel. I am working with no human (Plant) and it doesn't have several required files.

I only have the following data: reference genome (how is reference panel different from reference genome?), bam alignment from genomic reseq data from several individuals, and SNPs/InDels (vcf) from these bam alignment.



Login before adding your answer.

Traffic: 1135 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6