Question

Difficulty replicating likelihood ratio test from RNA-seq paper

1

Entering edit mode

9.3 years ago

James Ashmore ★ 3.5k

I have RNA-seq data from two treatment groups (three replicates each) and I am trying to replicate the likelihood ratio test outlined on page 1516 of this paper by Marioni et al: http://www.ncbi.nlm.nih.gov/pubmed/18550803. So far I have been able to fit a Poisson regression model for each gene using the following function and command:

poissonGLM <- function(gene) {
  data <- data.frame(treatment=pdata$treatment, count=gene)
  model <- glm(count ~ treatment, data=data, family=poisson)
  mean <- predict(model.pois, type="response"),
  stdev <- predict(model.pois, type="response", se.fit=T)$se.fit
}

results <- apply(edata, 1, poissonGLM)

For each gene, this computes the expected mean and expected variance for each treatment group. However, I'm unsure how to use this model to compute the maximum likelihood estimates both under the null and alternative hypothesis (as described in the paper), and generate a P-value from the Chi-squared distribution.

RNA-Seq R • 2.7k views

ADD COMMENT • link updated 9.3 years ago by Collin ▴ 1000 • written 9.3 years ago by James Ashmore ★ 3.5k

score 3 · Accepted Answer · 2016-03-31

3

Entering edit mode

9.3 years ago

Collin ▴ 1000

anova(model, test="Chi") should give you the likelihood ratio test result comparing the effect of treatment vs intercept only model. This is the same as if you had manually fit a glm with treatment and a second glm with intercept only and computed their deviance and compared it to a chi-squared distribution (d.f. = 1 in your case).

ADD COMMENT • link 9.3 years ago by Collin ▴ 1000

0

Entering edit mode

So the intercept only model is what Marioni et al. refer to as the null hypothesis, while the model with the treatment group specified refers to the alternative hypothesis?

glm(count ~ 1, data=data, family=poisson) # null hypothesis
glm(count ~ treatment, data=data, family=poisson) # alternative hypothesis

The P-value is therefore the following?

test <- anova(model, test="Chi")
test$`Pr(>Chi)`

They go on to estimate fold change by fitting the model under the alternative hypothesis, which would mean calculating the ratio of the expected means I've already computed?

treatment1_mean <- mean[1]
treatment2_mean <- mean[2]
treatment1_mean / treatment2_mean

ADD REPLY • link 9.3 years ago by James Ashmore ★ 3.5k

1

Entering edit mode

Yes the null hypothesis is the intercept only model. Likelihood ratio tests use the log-likelihood and not the predicted count. That means using the logLik function. This means the statistic is -2*(logLik(model.null) - logLik(model.treatment)) where the difference is subtraction because the log of ratios equal the log of the numerator minus the log of the denominator (a property of logarithms).The p-value is just the probability of observing that chi-squared statistic or greater from a chi-squared distribution with d.f. = 1 in your case. All the above work should give you the same result as the anova function I had mentioned initially. So yes the p-value can be extracted like you did above. Presumably thats what they mean by fold change but I haven't read through that paper before.

ADD REPLY • link 9.3 years ago by Collin ▴ 1000