This thread is accumulating traffic, so, I thought to provide some guidance.
From what I can gather, qctool became 'outdated' over time. If your aim is to perform basic QC after an IMPUTE2 imputation, then gauge quality via the r^2 INFO scores that are contained in the *summary files produced by IMPUTE2. A score of 1 indicates perfect imputation. Please use a search engine to search for
score for further information.
I never actually filter out any imputed variants based on INFO score - all are retained. What I do is, as I loop over all imputed 'chunks', I build a list of variants that have INFO score >= 0.9. This list is retained in the long term and can be used to filter on the final produced VCF/BCF.
In order to convert your IMPUTE2 data to other formats, I would recommend to first convert to VCF, which is the most standardised format for genetics data, and from which it should be easy to convert to any other format. For example, PLINK will easily import VCF data, and, from there, you can export again to many other formats, including the 'Oxford' GEN format (see https://www.cog-genomics.org/plink/1.9/data#recode)
IMPUTE2 output can be converted to VCF via:
mv chunk1_haps chunk1_haps.haps ;
shapeit -convert \
--input-haps chunk1_haps.haps \
--output-vcf chunk1.vcf ;
mv chunk1_haps.haps chunk1_haps ;
Note, that, before running the above command, you will have to add a .haps extension to your IMPUTE2 output files, as elaborated here:
I have an entire pre-phasing and imputation workflow, here (last tested in 2019/20):
- Phasing with SHAPEIT
- ERROR: You must specify a valid interval for imputation using the -int argument, -use_prephased_g: command not found, in IMPUTE2
Yes, I also became tired of all of these messy threads on imputation tools.