I want to cluster HAPMAP project data using EIGENSTRAT. Currently, I have difficulties with creating genotype file. In the EIGENSTRAT manual, it says
The genotype file contains 1 line per SNP. Each line contains 1 character per individual: 0 means zero copies of reference allele. 1 means one copy of reference allele. 2 means two copies of reference allele. 9 means missing data. In the following, it is one row of my huge data.
rs4475691 C/T chr1 836671 CT CC CC CC CC CT CC TT CC CC CC NN (and so on...)
1st column: snp id 2nd column: alleles 3rd column: chromosome 4th column: position
and the rest is patients genotype. I know NN is for missing data and it should be encoded as 9 according to EIGENSTRAT format, but I am not sure for
Any help would be greatly appreciated.