1
2
Entering edit mode
2.5 years ago

I received imputed dosage files from the Michigan Imputation Server (minimac3). The files are vcf format and compressed (gz). I used DosageConvertor to convert the files to plink dosage files; these files are also compressed.

When I try to use a compressed plink dosage file (for example, fileName.plink.dosage.gz) in a linear regression using plink 1.07, plink returns the error "ERROR: Bad format fdr (sic) dosage file, expecting more columns". However, if I uncompress the file and run the same analysis, plink produces no error and completes the analysis. I did include the argument --Zin with the compressed file and omitted the argument when using the uncompressed file.

I used wc to count the number of lines in the compressed file, and the returned line count was what it should be. I counted the number of “words”, and the returned count was correct. The number of columns should equal the number of words / number of lines, since each line should have the same number of words. But since plink is not counting the correct number of columns, does this mean there is a delimiter missing or a delimiter where it should not be.

I believe plink files are white space (space or tab) delimited. Nonetheless, I used sed to change each tab to a single white space and consecutive white spaces to a single white space. But plink is still unable to read the compressed file.

Of course, I can run the analysis with uncompressed files, but it would be nice to keep the files compressed. Can anyone suggest what the problem might be?

software error SNP plink • 1.4k views
0
Entering edit mode

Hi Paul, I was wondering whether you managed to solve the problem of using compressed dosage files? I am at the same stage right now, having received my dosage files from the Michigan Imputation server. I have used DosageConverter to convert the files to plink dosage and now have a set of compressed plink.dosage files. I need to perform some QC filtering on these in terms of MAF and HWE and was wondering whether it is better to uncompress the files and perform these steps.

0
Entering edit mode
2.5 years ago

The problem is that you are still using plink 1.07 for dosage analysis. plink 2.0 can read VCF dosages directly, and supports the full range of linear/logistic regression options on dosage data rather than the limited set offered by plink 1.x --dosage.

0
Entering edit mode

Hi,

you mean we can use plink run logistic regression for dose.vcf.gz directly? could you share the plink code to run this? i tried many different code. however, no one worked so far...Thank you for your help

0
Entering edit mode

This depends on how your dosages are encoded in the VCF, but something like

plink2 --vcf [VCF path] dosage=DS --pheno [phenotype file] --pheno-name [phenotype name] --glm


should work.

0
Entering edit mode

Thank you for your help. So, if i want to add some covariant, I could aff --covar ...? or I need other different command?

0
Entering edit mode

Yes, use --covar for that.