Question: Which reference panel to use with Impute2
gravatar for raja.venks
6.1 years ago by
United States
raja.venks20 wrote:

Dear All

I am new to bio informatics. Currently I am trying to impute the genotypes for a specific SNP and I have managed impute using a phased reference panel (haplotypes generated from SHAPEIT were downloaded form the impute2 reference dataset). I have genotype data analyzed using Taqman assay in about two-third of the subjects. There is good agreement between the imputed genotypes and the experimental data. However, impute2 doesnot reliably predict gonotypes in 15% of the population. So I would want to explore more exhaustive methods. 

1) Use of pre-phased haplotype files is supposed to result in the loss of some efficacy. A more accurate prediction can be obtained (at the cost of more computing time) by using unphased reference data ( Using -g_ref option within impute2). However, I dont find a lot of information about this option. Where can I download the reference data (1000 gnomes) ? How to do I convert it to the genotype format that can be used with impute2 ?

2) As per the impute2 help: "This procedure is not recommended for unphased reference panels that have high SNP density, such as those that result from resequencing studies of population samples. In that situation, there may be statistical convergence issues that could decrease the imputation quality". What do they mean by "high SNP density". Is the SNP density for the 1000 genome data very high rendering it unsuitable for imputation with unphased reference panel ?

Thank you

1000 genome format impute2 • 3.7k views
ADD COMMENTlink modified 4.9 years ago by tulsi11920 • written 6.1 years ago by raja.venks20
gravatar for tulsi1192
4.9 years ago by
United Kingdom
tulsi11920 wrote:

You've probably figured it out by now but just incase someone comes looking for answers...

You can download the 1000 Genomes phased reference datasets from the IMPUTE2 website and if you want the format for IMPUTE2 (.haps, .sample) it is already available, you need to click a dataset i.e. Phase 3, then scroll down on that page and download the .tgz file (1000GP_Phase3.tgz) - be aware these files are quite large in size.

ADD COMMENTlink modified 10 months ago by RamRS30k • written 4.9 years ago by tulsi11920

What should I do if I don't have phased/unphased reference panel. I am working with no human (Plant) and it doesn't have several required files.

I only have the following data: reference genome (how is reference panel different from reference genome?), bam alignment from genomic reseq data from several individuals, and SNPs/InDels (vcf) from these bam alignment.


ADD REPLYlink written 4.5 years ago by kirannbishwa011.2k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2035 users visited in the last hour