Question: logistic regression using HLA alllelic data
4
gravatar for monalc40
14 months ago by
monalc4030
monalc4030 wrote:

I have a case-control dataset and I want to perform logistic regression and conditional logistic regression based on HLA multi-allelic data, using r. I want to find the effect on specific alleles on a trait. How do I do this what is the format. Most examples are based on SNP biallelic data. For instance at HLA-A I may have up to 30 unique alleles, at HLA-B it could be 50. Should I recode all the alleles and perform logistic regression on genotype pairs?

R • 526 views
ADD COMMENTlink modified 14 months ago by Lemire600 • written 14 months ago by monalc4030

If you are merely asking this as a technical question, then you can do this in R via glm(). Your SNP predictors can be encoded categorically for as AA, AB, BB, or continuously as minor allele counts.

Kevin

ADD REPLYlink written 14 months ago by Kevin Blighe69k
3
gravatar for Lemire
14 months ago by
Lemire600
Canada
Lemire600 wrote:

Find a way to produce a data frame containing the counts of each alleles that you see, and the case-controls status. E.g. (fake data)

> df

 DX DRB1.0401 DRB1.0404 DRB1.0405 DRB1.0408
1  0         0         0         1         1
2  0         0         0         0         2
3  0         0         0         0         2
4  1         1         0         0         1
5  1         1         0         0         1
6  1         0         1         0         1

If you are interested on the effect of a specific allele, then you can do, e.g.

summary(glm( DX ~ DRB1.0401 , family="binomial", data=df ) )

If you are interested in the effect of your HLA locus as a whole, then you can do, e.g.

full<- glm( DX ~ DRB1.0401 + DRB1.0404 + DRB1.0405 + DRB1.0408, 
   family="binomial", data=df ) 
null<- glm( DX ~ 1 , family="binomial", data=df ) 

anova( null, full , test="Chisq")

adding covariates to the models if deemed necessary.

ADD COMMENTlink written 14 months ago by Lemire600

The problem has been solved, thanks

ADD REPLYlink written 12 months ago by monalc4030

To expand on this, how would it look if you did have covariate (sex). For example say I have a multialleleic locus with three possible snps:

 data <- data.frame("snp1"=c(runif(n=150, min=0,max=2),
                          c(runif(n=50, min=0,max=2))),
                  "snp2"=c(runif(n=50, min=0, max=.2),
                           runif(n=50, min=0, max=.2),
                           runif(n=50, min=1.5, max=2),
                           runif(n=50, min=1.5, max=2)),
                  "snp3"=c(runif(n=50, min=0, max=.2),
                          runif(n=50, min=0, max=.2),
                          runif(n=50, min=1.5, max=2),
                          runif(n=50, min=1.5, max=2)),
                  "sex"=runif(n=50, min=0, max=1),
                   "disease"=c(rbinom(150, 1, 0.1),
                               rbinom(50, 1, 0.9)))

to test locus at whole I would do this:

multi_snp_full <- glm(disease ~ snp2 + snp3 + sex, data=data, family="binomial")
null <- glm(disease ~ sex, data=data, family="binomial") 
anova( null, multi_snp_full , test="Chisq")

If I wanted to go back and test snp2 specifically, would it just be this (with no LR test)?

single_snp_test <- glm(disease ~ snp2 + sex, data=data, family="binomial")
ADD REPLYlink written 9 weeks ago by curious470
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2563 users visited in the last hour
_