Is there a reason to limit dummy coding these alleles as 0 or 1 in this regression model example, instead of allowing values of 0,1,or 2?

0

Entering edit mode

3.4 years ago

curious ▴ 750

This is not homework its just an example I found online.

They are testing association between HLA alleles and some binary disease here

in the data frame they have col DQDRa1 and DQDRa2 for haplotypes of HLA gene DQDR. It can be 10 different alleles which they recode into a number of dummy-variables D1, D6, D7, etc., where Di=1 if the subject has at least one ’i’-allele, and =0 otherwise. I don't get why they don't let the Di values go to 2, like with the example at index 460. This individual has diploid for DQDR, but they set D15 col to 1 instead of 2. Why do this?

I think the regression is supposed to look like this is that helps

library(data.frame)
data <- fread("http://www.math.chalmers.se/Stat/Grundutb/CTH/tms121/1011/diabetes.txt")
fit <- glm(Y ~ D6 + D7 + D9 + D10 + D11 + D12 + D13 + D14 + D15 + D99, family="binomial", data=data)

logistic regression gwas • 771 views

ADD COMMENT • link 3.4 years ago by curious ▴ 750

1

Entering edit mode

I assume because they test association for a dominant model, where 0/1 supposedly has the same effect as 1/1.