Question: Phased Data In Plink
gravatar for Pierre
9.6 years ago by
Pierre490 wrote:

Hello everybody,

I wonder whether it exists an easy way to conserve haplotypes when proceeding some basic actions with plink. Is there any options? I couldn't find on my own.

The situation is as following: we have phased data. We use the .ped and .map format. We want to apply some filters (e.g keep SNPs with Minor allele frequency above 5% in our set of individuals, keep a subset of individuals, etc.). But we figured out that plink do not keep the phased in this case. Everything is mixed up in the output files.

Thanks for the support! Regards Pierre

plink genotyping • 7.0k views
ADD COMMENTlink modified 2.1 years ago by Shicheng Guo8.4k • written 9.6 years ago by Pierre490
gravatar for Pablo Marin-Garcia
9.4 years ago by
Pablo Marin-Garcia1.8k wrote:

When you recode your ped, plink puts the minor frequency allele as A1. plink does not guarantee that it would keep your phase but probably you can keep your alleles the same way you inserted (and keep phase) if you use --keep-allele-order?.

ADD COMMENTlink modified 9.4 years ago • written 9.4 years ago by Pablo Marin-Garcia1.8k
gravatar for Larry_Parnell
9.6 years ago by
Boston, MA USA
Larry_Parnell16k wrote:

This is one reason we do not use PLINK. We prefer to use HelixTree by Golden Helix, where such issues are not a problem.

ADD COMMENTlink written 9.6 years ago by Larry_Parnell16k

I will check it out!

ADD REPLYlink written 9.6 years ago by Pierre490
gravatar for Shicheng Guo
2.1 years ago by
Shicheng Guo8.4k
Shicheng Guo8.4k wrote:

plink2 solved the problems you mentioned here: link

plink2 --vcf chr1.vcf --make-pgen --out chr1

The --pfile flag usually causes the binary fileset prefix.pgen + prefix.pvar + prefix.psam to be referenced, while --pgen/--pvar/--psam let you fully name one file at a time. New features supported by these formats include:

Reliable tracking of REF vs. ALT alleles. Computationally efficient compression of low-MAF and high-LD variants. Phased genotypes. Dosages. VCF-style header information (including species-specific chromosome info, so you don't have to constantly use --chr-set). Multiallelic variants. Multiple phenotypes. Named categorical phenotypes.

ADD COMMENTlink written 2.1 years ago by Shicheng Guo8.4k
gravatar for Stephanie
9.6 years ago by
San Diego, CA
Stephanie20 wrote:

I also couldn't find any way on PLINK's website to allow you to input a phased format.

Have you tried just getting the report of MAF (--freq), finding what SNPs fail the threshold you set, and then having PLINK remove specific SNPs (--exclude snplist.txt) that way? It isn't quite as elegant but might work.

ADD COMMENTlink written 9.6 years ago by Stephanie20

Thanks Stephanie. I could be an option BUT I think plink, in any case just mix up the phase. if you give the following simple command plink --file input --recode --out output (so nothing is done and there shouldn't be any differneces between input and output) the phases are lost anyway. So, ok there is no way using plink to keep phases. :-S

ADD REPLYlink written 9.6 years ago by Pierre490
gravatar for Pierre
9.4 years ago by
Pierre490 wrote:


we found a way in the meantime that may be useful: you recode each diploid individual in 2 haploid ones

Ind1 Ind1 0 0 0 0 A T G C
Ind2 Ind2 0 0 0 0 G C A T


Ind1 Ind1_a 0 0 0 0 A A G G
Ind1 Ind1_b 0 0 0 0 T T C C 
Ind2 Ind2_a 0 0 0 0 G G A A
Ind2 Ind2_b 0 0 0 0 C C T T

but you then increase the size of your data by ~ 2-fold....

ADD COMMENTlink modified 7.7 years ago by Giovanni M Dall'Olio27k • written 9.4 years ago by Pierre490

Hi Pierre,

Basically, I'm new to bioinformatics, and PLINK (obviously). Sorry for asking quite a silly question... the PED files I'm given to be used for analysis are also in the format you mentioned (since they are phase data). Will this interfere with downstream analysis having two haploid ones from the same individual? I don't know if this question makes sense...


ADD REPLYlink written 6.2 years ago by Cindy Chan20
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1691 users visited in the last hour