Gwas: Matched Pairs Logistic Regression
1
1
Entering edit mode
11.2 years ago
bdeonovic ▴ 210

Logistic regression is a common analysis tool used for GWAS when your response variable of interested is qualitative. It comes as one of the standard tools in most GWAS packages (e.g. PLINK).

Most logistic regression models for GWAS would be setup as:

log(odds of disease) = \beta_0 + \beta_1*X

Where X is number of copies of the minor allele for a particular SNP of interest. However, suppose that my case-control data is matched (In my case matched by age, BMI, reported ethnicity, and distance to procurement site). I don't think standard logistic regression (as I have outlined above) is valid. What does everybody do? I don't see options for this in packages like PLINK.

gwas • 5.9k views
ADD COMMENT
1
Entering edit mode

By matching cases and controls by age, BMI, etc., we are just trying to reduce confounding factors. I think logistic regression is still valid. Also, plink can handle other variables as covariate.

ADD REPLY
0
Entering edit mode

I know the logistic regression model can have other covariates. The logistic regression model requires the responses to be independent. If they are matched pairs the responses are not independent (If I know this 30 year old African-American with high BMI has the disease, it changes the probability that another 30 year old African-American with high BMI has the disease)

ADD REPLY
1
Entering edit mode

How do you have matched people? Are they twins? I don't understand the point otherwise, since it seems that pairing them up would be restrictive (if you can't pair everyone) or inaccurate (you make some bad pairs). I think the more natural approach is to include the covariates (age, BMI, ethnicity, etc.) in the regression. Then you can use all your samples, without a possibly unnatural pairing. I guess this is basically what zx8754 said...

ADD REPLY
1
Entering edit mode

They are paired by age, sex, BMI, ethnicity. You are right the pairing is not perfect. I did not design the study...I'm just the poor sob who gets to analyze the data.

ADD REPLY
1
Entering edit mode
11.2 years ago

Well, the logistic regression might be more generally thought of as:

log(odds of disease) = \beta_0 + \beta_1 * X + \beta_2 * BMI + \beta_3 * Ethnicity ...

There can also be interactions, of course. I've never needed to use plink, but its documentation suggests that it can handle this sort of model.

ADD COMMENT
0
Entering edit mode

I know the logistic regression model can have other covariates. The logistic regression model requires the responses to be independent. If they are matched pairs the responses are not independent (If I know this 30 year old African-American with high BMI has the disease, it changes the probability that another 30 year old African-American with high BMI has the disease)

ADD REPLY

Login before adding your answer.

Traffic: 2229 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6