Question: How to convert vcf dosage file format to mldose format?
6.1 years ago
United States
genetic40 wrote:

Recently, we imputed our data using

However, since the output file format is vcf dosage data, I cannot use Mach2dat for GWAS association analysis.

Does anyone know how to convert vcf dosage file format to mldose format?

Thank you so much!


Dosage file format (mldose for Mach2dat input format):

10009->QZ0526   DOSE    1.997   2.000   1.285   1.997   1.996   1.997   1.994   1.999   0.735   1.750   1.936   

10010->QZ0488   DOSE    0.783   0.002   1.996   0.853   0.011   0.791   0.830   1.998   0.930   1.996   1.987   


VCF file format from imputation:

#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  QZ0001  QZ0002  QZ0003  QZ0006  

20      60479   20:60479        C       T       .       PASS    MAF=0.00078;R2=0.01037  GT:DS   0|0:0.001       0|0:0.009       0|0:0.008       0|0:0.000  

20      60522   20:60522        T       TC      .       PASS    MAF=0.00249;R2=0.02914  GT:DS   0|0:0.001       0|0:0.002       0|0:0.000       0|0:0.002      


imputation vcf • 4.2k views
imputation vcf • 4.2k views
ADD COMMENTlink modified 4.4 years ago by lara.sucheston0 • written 6.1 years ago by genetic40

That VCF would be easily converted to a PLINK .dat file - would that suit your purpose?

ADD REPLYlink written 6.0 years ago by coleman_jonathan440

Could you be more specific on that? Thank you.

ADD REPLYlink written 5.5 years ago by yfang0

Yeah, could you elaborate more? I only know how to use PLINK to conver to ped file.

ADD REPLYlink written 4.2 years ago by xliu425510

Hi I have this problem. Have you solved it? Could you give me the solution?

ADD REPLYlink written 4.4 years ago by fatima20
4.4 years ago
lara.sucheston0 wrote:

You can actually do a binary and quantitative GWAS using the michigan server output with SNPTEST if you are open to using different software. note in this example they are using the genotype (GT) versus probability (GP)

Commands are as follows snptest \  -data yourvcf.vcf.gz sample_file.sample \ -genotype_field GP \ -o outputfilename.txt \ -frequentist 1 \ -method score \ -pheno Affected \ -cov_names cov1 cov2 cov3 \ #should you need covariates

Hope this helps

ADD COMMENTlink written 4.4 years ago by lara.sucheston0
Please log in to add an answer.


