Logistic regression is a common analysis tool used for GWAS when your response variable of interested is qualitative. It comes as one of the standard tools in most GWAS packages (e.g. PLINK).

Most logistic regression models for GWAS would be setup as:

log(odds of disease) = \beta_0 + \beta_1*X

Where X is number of copies of the minor allele for a particular SNP of interest. However, suppose that my case-control data is matched (In my case matched by age, BMI, reported ethnicity, and distance to procurement site). I don't think standard logistic regression (as I have outlined above) is valid. What does everybody do? I don't see options for this in packages like PLINK.

By matching cases and controls by age, BMI, etc., we are just trying to reduce confounding factors. I think logistic regression is still valid. Also, plink can handle other variables as covariate.

I know the logistic regression model can have other covariates. The logistic regression model requires the responses to be independent. If they are matched pairs the responses are not independent (If I know this 30 year old African-American with high BMI has the disease, it changes the probability that another 30 year old African-American with high BMI has the disease)

How do you have matched people? Are they twins? I don't understand the point otherwise, since it seems that pairing them up would be restrictive (if you can't pair everyone) or inaccurate (you make some bad pairs). I think the more natural approach is to include the covariates (age, BMI, ethnicity, etc.) in the regression. Then you can use all your samples, without a possibly unnatural pairing. I guess this is basically what zx8754 said...

They are paired by age, sex, BMI, ethnicity. You are right the pairing is not perfect. I did not design the study...I'm just the poor sob who gets to analyze the data.