I'm trying to use impute tool to impute more snps using ref panels with my GT data, I followed impute2 tutorial, its example/toy data works fine, then I used the faster impute4 and impute5 tools for mydata as the run gets killed on impute2 with my data. anyways, the command runs to completion fine but the output imputed file has an unexpected no. of columns, (should be 3N+5), I noticed that as I was trying to recode into vcf, it says for that .sample file, this imputed.gens file is expected to have X no. of columns.
To confirm, it has the right 5 columns (rsid1 id2 pos a1 a2) followed by indeed a larger NF than in the pre-imputed gens file when I ran an awk print NF, my understanding is the no. of columns should remain constant at 3N+5 , while the number of lines (rows/snps) increase post-imputation, which happened, but why did the no. of cols also increase?! I don't get how that happened
This is the command I used, ref panels are from the download ref link on impute2 page; the .gens and .strand files were generated by plink from the bfile (bim,bed,fam)
../impute4.1.2_r300.1 \ -h ../1kgenomes2/1000GP_Phase3/1000GP_Phase3_chr22.hap.gz \ -l ../1kgenomes2/1000GP_Phase3/1000GP_Phase3_chr22.legend.gz \ -m ../1kgenomes2/1000GP_Phase3/genetic_map_chr22_combined_b37.txt \ -g chr22.gens \ -strand chr22.strand.txt \ -no_maf_align \ -int 17057138 23057138 \ -o test22againnewtzf \ -o_gz
Thank you so much in advance for any helpful input,
Just adding this is the command to get the vcf using .gen and .sample:
plink \ --gen test22againnewtzf.gen.gz \ --sample plink.sample \ --oxford-single-chr 1 \ --recode vcf \ --out neuroximpute2.plink
Error: Unexpected number of columns in .gen file (1862 or 1863 expected).