How to fix phenotype information when converting from VCF to PLINK
10 weeks ago
Diego • 0

I have a series of VCF files (one for each chromosome) that have an incorrect values for the phenotype (all samples have the value "-9"). I have read on other questions that the make-pheno or pheno flags can be used to add phenotype information when creating the PLINK format files. But I am unsure if thsis would work correctly since the VCF files "incorrectly" already provide that information.

plink -- bfile file --make-pheno phenotype_file.phe --make_bed --out out_file

Would this work right even if the VCF files contain the PHENO column? Also how can I generate the phenotype file? Can it be done in a program like Excel and then exported to csv, and then somehow converted to .phe format?

Any help is appreciated!

9 weeks ago
Diego • 0

I figured it out, for anyone who might be trying something similar: the make-pheno flag allows to add/fix the phenotype information if there is a problem with it. Instead of a phe file format, a tab separated .txt file works fine. The file should contain 3 columns FID, IID and PHENO (without headers). So for many cases the first two columns contain the exact same ids and then the 3rd column has the phenotype. The phenotype can be a string (eg. 'POS' and 'NEG') , you just need to define the positive case in the command, just as follows:

plink --vcf file.vcf --make-pheno phenotype_file.txt POS --make_bed --out out_file

To make the phenotype.txt file excel works great!


