Say I have a gentoype matrix where I use the encoding
0 = HOM_REF 1 = HET 2 = HOM_ALT NA = Missing genotype
For instance this dummy genotype matrix with 3 variants and 3 samples
Variant_1 0 1 1 Variant_2 1 1 NA Variant_3 NA 0 1 etc
Do you need to first impute the genotype matrix to not have any missing genotypes(NA values)?
Or do you set the NA values to something like -9 or -999? This would influence the output of the linear / logistic regression heavily for variants with a lot of missing genotypes?