Question: Difficulty replicating likelihood ratio test from RNA-seq paper
1
gravatar for James Ashmore
18 months ago by
James Ashmore2.0k
UK/Edinburgh/MRC Centre for Regenerative Medicine
James Ashmore2.0k wrote:

I have RNA-seq data from two treatment groups (three replicates each) and I am trying to replicate the likelihood ratio test outlined on page 1516 of this paper by Marioni et al: http://www.ncbi.nlm.nih.gov/pubmed/18550803. So far I have been able to fit a Poisson regression model for each gene using the following function and command:

poissonGLM <- function(gene) {
  data <- data.frame(treatment=pdata$treatment, count=gene)
  model <- glm(count ~ treatment, data=data, family=poisson)
  mean <- predict(model.pois, type="response"),
  stdev <- predict(model.pois, type="response", se.fit=T)$se.fit
}

results <- apply(edata, 1, poissonGLM)

For each gene, this computes the expected mean and expected variance for each treatment group. However, I'm unsure how to use this model to compute the maximum likelihood estimates both under the null and alternative hypothesis (as described in the paper), and generate a P-value from the Chi-squared distribution.

rna-seq R • 653 views
ADD COMMENTlink modified 18 months ago by Collin390 • written 18 months ago by James Ashmore2.0k
3
gravatar for Collin
18 months ago by
Collin390
United States
Collin390 wrote:

anova(model, test="Chi") should give you the likelihood ratio test result comparing the effect of treatment vs intercept only model. This is the same as if you had manually fit a glm with treatment and a second glm with intercept only and computed their deviance and compared it to a chi-squared distribution (d.f. = 1 in your case).

ADD COMMENTlink written 18 months ago by Collin390

So the intercept only model is what Marioni et al. refer to as the null hypothesis, while the model with the treatment group specified refers to the alternative hypothesis?

glm(count ~ 1, data=data, family=poisson) # null hypothesis
glm(count ~ treatment, data=data, family=poisson) # alternative hypothesis

The P-value is therefore the following?

test <- anova(model, test="Chi")
test$`Pr(>Chi)`

They go on to estimate fold change by fitting the model under the alternative hypothesis, which would mean calculating the ratio of the expected means I've already computed?

treatment1_mean <- mean[1]
treatment2_mean <- mean[2]
treatment1_mean / treatment2_mean
ADD REPLYlink modified 18 months ago • written 18 months ago by James Ashmore2.0k
1

Yes the null hypothesis is the intercept only model. Likelihood ratio tests use the log-likelihood and not the predicted count. That means using the logLik function. This means the statistic is -2*(logLik(model.null) - logLik(model.treatment)) where the difference is subtraction because the log of ratios equal the log of the numerator minus the log of the denominator (a property of logarithms).The p-value is just the probability of observing that chi-squared statistic or greater from a chi-squared distribution with d.f. = 1 in your case. All the above work should give you the same result as the anova function I had mentioned initially. So yes the p-value can be extracted like you did above. Presumably thats what they mean by fold change but I haven't read through that paper before.

ADD REPLYlink modified 18 months ago • written 18 months ago by Collin390
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 691 users visited in the last hour