Question: Plink Bed Format Confusion
5.3 years ago by
kindlychung40 wrote:

I have been reading the description of plink file formats ( ) and there are some questions.

First, how is the null distinguished from the first homozygote? In terms of binary bits they are exactly the same:

            Genotype    Person    SNP

       00   G/G         1 1       snp1
     11     A/A         1 2       snp1
   10       0/0         1 3       snp1
 11         A/A         2 1       snp1


       11   A/A         2 2       snp1
     11     A/A         2 3       snp1
   00       (null)
 00         (null)

Second, is there any reason for reading the bits in the reverse order?

Third, it would be more natural and intuitive to encode (homozygote 1, heterozygote, homozygote 2) as (00, 01, 10), which in decimal is just (0, 1, 2), what is the motivation behind designating homozygote 2 as 11?

plink • 1.7k views
plink • 1.7k views
5.3 years ago by
zx87546.5k wrote:

I wouldn't go as far as using the word myths. Plink is one of the most robust pieces of softwares, with a good documentation.

Now regarding the first point, they do look exactly the same, but (null) is used when there are no more individuals left for that byte for snp1. Plink already knows where to stop from the fam file. So in above example, there are 6 individuals, that's why after 6 individuals any bit is read as (null). From plink manual:

Finally, when we reach the end of a SNP (or if in individual-mode, the end of an individual) we skip to the start of a new byte (i.e. skip any remaining bits in that byte).

Second and third points could be simply programmers' design choice.

ADD COMMENTlink written 5.3 years ago by zx87546.5k
