I received imputed dosage files from the Michigan Imputation Server (minimac3). The files are vcf format and compressed (gz). I used DosageConvertor to convert the files to plink dosage files; these files are also compressed.
When I try to use a compressed plink dosage file (for example, fileName.plink.dosage.gz) in a linear regression using plink 1.07, plink returns the error "ERROR: Bad format fdr (sic) dosage file, expecting more columns". However, if I uncompress the file and run the same analysis, plink produces no error and completes the analysis. I did include the argument --Zin with the compressed file and omitted the argument when using the uncompressed file.
I used wc to count the number of lines in the compressed file, and the returned line count was what it should be. I counted the number of “words”, and the returned count was correct. The number of columns should equal the number of words / number of lines, since each line should have the same number of words. But since plink is not counting the correct number of columns, does this mean there is a delimiter missing or a delimiter where it should not be.
I believe plink files are white space (space or tab) delimited. Nonetheless, I used sed to change each tab to a single white space and consecutive white spaces to a single white space. But plink is still unable to read the compressed file.
Of course, I can run the analysis with uncompressed files, but it would be nice to keep the files compressed. Can anyone suggest what the problem might be?
All advice is appreciated, Paul