Question

Off topic:Applying linear mixed model for RNA-Seq data

1

Entering edit mode

4.7 years ago

Assa Yeroslaviz ★ 1.8k

We have an RNA-Seq data set from mouse with three conditions in triplicates. For better understanding of the reaction, each of the animal was weighted and its urea levels were measured beforehand.

We would like to apply a linear mixed-model to the data-set in order to analyze the influence of weight and urea conc. on the expression intensities of genes in my RNA-Seq experiment.

The weight.score and urea.score were calculated based on multiple measurements on different days

We're trying to identify genes, for which a correlation between the urea conc., body weight and their expression behavior exits.

Our first try was to apply a Pearson correlation to the total.score, we calculated for each of the genes (see script below). But in the review we were asked to do to apply a mixed linear-model to test the influence of the body weight and urea concentration on those expressions values and to examine the degree of contribution of each gene to the outcome score. and here I would like to ask for your help.

We would like to include both the urea.score and the weight.score in the mixed-model (Or do I need to include the actual values in the model?). I'm not sure how to do so.

Our question is - Is there an association between the body weight and urea concentration and the intensities of the gene expression?

In the model I regards the gene expression as my dependent variable, while urea.score and weight.score are the random variables. Then fitting the mixed model with these effects would be somthing like that lmer(Gene.X ~ (1|urea.score) + (1|weight.score), data = mergedCorr.norm).

But when running the linear model for urae and body-weight separately I can see a significance, while in the mixed-model this is no longer apparent. Am I setting the parameters correctly? Can I sue these two parameters as random variables, even though I have only three different levels?

I would appreciate any help or suggestions as to how I should apply these here.

thanks Assa

Script used:

# prepare data
library(lme4)
library(ggplot2)

Scores.RiboM <- structurelistAnimal.ID = structure(c(5L, 6L, 7L, 4L, 9L, 10L, 
2L, 1L, 8L, 3L), .Label = c("C20", "C22", "C24", "CR4.1", "CR4.2", 
"CR4.3", "CR4.4", "HP11", "HP12", "HP14"), class = "factor"), 
group = c(3L, 3L, 3L, 3L, 2L, 2L, 1L, 1L, 2L, 1L), urea = structure(c(8L, 
10L, 3L, 2L, 1L, 4L, 6L, 5L, 9L, 7L), .Label = c("105.57", 
 "110.71", "125.89", "195.87", "321.43", "329.92", "340.81", 
 "35.59", "363.37", "94.76"), class = "factor"), weight = structure(c(9L, 
 6L, 4L, 2L, 8L, 7L, 3L, 5L, 10L, 1L), .Label = c("20", "20.1", 
 "20.7", "21.9", "22.1", "22.6", "23", "23.3", "23.5", "24"
 ), class = "factor"), urea.score = c(1L, 1L, 1L, 1L, 1L, 
2L, 3L, 4L, 4L, 4L), weight.score = c(1L, 1L, 1L, 2L, 3L, 
3L, 3L, 3L, 3L, 4L), total.score = c(2L, 2L, 2L, 3L, 4L, 
5L, 6L, 7L, 7L, 8L)), row.names = c(8L, 9L, 10L, 7L, 5L, 
6L, 2L, 1L, 4L, 3L), class = "data.frame")

# scale() centers the data (the column mean is subtracted from the values in the column) and then scales it (the centered column values are divided by the column’s standard deviation).
gene.scaled <- structure(c(1.14326057352769, -0.331767048798829, 2.0634403441122, 
0.0974979558410874, -0.477899816335822, 0.634079211640983, -0.487033114306884, 
-1.06014756199103, -0.982514529237001, -0.598916014452395), .Dim = c(10L, 
1L), .Dimnames = list(c("C20", "C22", "C24", "HP11", "HP12", 
"HP14", "CR4.1", "CR4.2", "CR4.3", "CR4.4"), "Gene.X"))

# Correlation
mergedCorr.norm <- merge(Scores.RiboM, gene.scaled, by.x=1, by.y=0)
mergedCorr.norm <- mergedCorr.norm[order(mergedCorr.norm$total.score, decreasing = FALSE),]
CorValue.pear <- cor.test(x=mergedCorr.norm$total.score, y=mergedCorr.norm[,8], method = "pearson")
CorValue.spear <- cor.test(x=mergedCorr.norm$total.score, y=mergedCorr.norm[,8], method = "spearman")

# Plotting
ggplot(mergedCorr.norm, aes(x=total.score, y=mergedCorr.norm[,8], label=mergedCorr.norm$Animal.ID)) +
        geom_point(shape=1) +
        geom_smooth(method=lm) +
        ggtitle(names(mergedCorr.norm)[4])+
        scale_y_continuous(name="norm. read counts")+
        geom_text(size=3, hjust=-0.1, vjust=-0.4,colour="darkgreen")+ 
        annotate("text", x = 8, y = max(mergedCorr.norm[,8]),label=paste("Pear. r= ", format(CorValue.pear$estimate, digits = 4), "\n Spear. r = ", format(CorValue.spear$estimate, digits = 4), sep=""), colour=ifelse(CorValue.pear$estimate>=0.9, "blue","red")) +
        theme_bw(base_size = 12, base_family = "")

# linear model 
lm.weight <- lm(Gene.X ~ weight.score, data = mergedCorr.norm)
lm.urea <- lm(Gene.X ~ urea.score, data = mergedCorr.norm)

# mixed-model
mixed.Exp <- lmer(Gene.X ~ (1|urea.score) + (1|weight.score), data = mergedCorr.norm)
stargazer(mixed.Exp, type = "text",
    digits = 3,
    star.cutoffs = c(0.05, 0.01, 0.001),
    digit.separator = "")

RNA-Seq linear-mixed-model linear-regression lme4 • 967 views

ADD COMMENT • link 4.7 years ago by Assa Yeroslaviz ★ 1.8k