Question: Plink Bed Format Confusion
gravatar for kindlychung
7.3 years ago by
kindlychung60 wrote:

I have been reading the description of plink file formats ( ) and there are some questions.

First, how is the null distinguished from the first homozygote? In terms of binary bits they are exactly the same:

            Genotype    Person    SNP

       00   G/G         1 1       snp1
     11     A/A         1 2       snp1
   10       0/0         1 3       snp1
 11         A/A         2 1       snp1


       11   A/A         2 2       snp1
     11     A/A         2 3       snp1
   00       (null)
 00         (null)

Second, is there any reason for reading the bits in the reverse order?

Third, it would be more natural and intuitive to encode (homozygote 1, heterozygote, homozygote 2) as (00, 01, 10), which in decimal is just (0, 1, 2), what is the motivation behind designating homozygote 2 as 11?

plink • 2.3k views
ADD COMMENTlink modified 7.3 years ago by Aaronquinlan11k • written 7.3 years ago by kindlychung60
gravatar for zx8754
7.3 years ago by
zx875410.0k wrote:

I wouldn't go as far as using the word myths. Plink is one of the most robust pieces of softwares, with a good documentation.

Now regarding the first point, they do look exactly the same, but (null) is used when there are no more individuals left for that byte for snp1. Plink already knows where to stop from the fam file. So in above example, there are 6 individuals, that's why after 6 individuals any bit is read as (null). From plink manual:

Finally, when we reach the end of a SNP (or if in individual-mode, the end of an individual) we skip to the start of a new byte (i.e. skip any remaining bits in that byte).

Second and third points could be simply programmers' design choice.

ADD COMMENTlink written 7.3 years ago by zx875410.0k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1916 users visited in the last hour