Question: Difficulty replicating likelihood ratio test from RNA-seq paper
1
James Ashmore2.8k wrote:

I have RNA-seq data from two treatment groups (three replicates each) and I am trying to replicate the likelihood ratio test outlined on page 1516 of this paper by Marioni et al: http://www.ncbi.nlm.nih.gov/pubmed/18550803. So far I have been able to fit a Poisson regression model for each gene using the following function and command:

``````poissonGLM <- function(gene) {
data <- data.frame(treatment=pdata\$treatment, count=gene)
model <- glm(count ~ treatment, data=data, family=poisson)
mean <- predict(model.pois, type="response"),
stdev <- predict(model.pois, type="response", se.fit=T)\$se.fit
}

results <- apply(edata, 1, poissonGLM)
``````

For each gene, this computes the expected mean and expected variance for each treatment group. However, I'm unsure how to use this model to compute the maximum likelihood estimates both under the null and alternative hypothesis (as described in the paper), and generate a P-value from the Chi-squared distribution.

rna-seq R • 1.4k views
modified 3.8 years ago by Collin680 • written 3.8 years ago by James Ashmore2.8k
3
Collin680 wrote:

anova(model, test="Chi") should give you the likelihood ratio test result comparing the effect of treatment vs intercept only model. This is the same as if you had manually fit a glm with treatment and a second glm with intercept only and computed their deviance and compared it to a chi-squared distribution (d.f. = 1 in your case).

So the intercept only model is what Marioni et al. refer to as the null hypothesis, while the model with the treatment group specified refers to the alternative hypothesis?

``````glm(count ~ 1, data=data, family=poisson) # null hypothesis
glm(count ~ treatment, data=data, family=poisson) # alternative hypothesis
``````

The P-value is therefore the following?

``````test <- anova(model, test="Chi")
test\$`Pr(>Chi)`
``````

They go on to estimate fold change by fitting the model under the alternative hypothesis, which would mean calculating the ratio of the expected means I've already computed?

``````treatment1_mean <- mean
treatment2_mean <- mean
treatment1_mean / treatment2_mean
``````
1

Yes the null hypothesis is the intercept only model. Likelihood ratio tests use the log-likelihood and not the predicted count. That means using the `logLik` function. This means the statistic is `-2*(logLik(model.null) - logLik(model.treatment))` where the difference is subtraction because the log of ratios equal the log of the numerator minus the log of the denominator (a property of logarithms).The p-value is just the probability of observing that chi-squared statistic or greater from a chi-squared distribution with d.f. = 1 in your case. All the above work should give you the same result as the anova function I had mentioned initially. So yes the p-value can be extracted like you did above. Presumably thats what they mean by fold change but I haven't read through that paper before.