Question: Lmer() for RNA-seq with batch as random effect: how to prepare input?
gravatar for Ekarl2
3.1 years ago by
Ekarl290 wrote:

I have an RNA-seq dataset with two treatments (A and B) and four environments in a classic +/+, +/-, -/+, -/- design with four replicates for environment for a total of 16 samples. There are also four batches: 1 replicate of each environment occurs in the first batch, 2 replicate of each environment in second batch and so on.

So far, I have used the glm methods of EdgeR with the model ~batch+A*B and gotten out the effects of A and B and interactions.

However, more senior researchers wants me to provide them with results for a mixed model approach and use the lmer() function from the lme4 package to model the batch as a random factor instead. Their concern was, as I have understood it, that the effects of B might be drowned out if batch is treated as a fixed factor. So they want to find how much variation in B contributes to the observed expression values.

From reading the manuals, vignettes and tutorials, I understand the basic idea of the lmer() function on their example datasets, but the input format seems widely different from mine.

Their format:

> head(Dyestuff)
  Batch Yield
1     A  1545
2     A  1440
3     A  1440
4     A  1520
5     A  1580
6     B  1540

> str(Dyestuff)
'data.frame':   30 obs. of  2 variables:
 $ Batch: Factor w/ 6 levels "A","B","C","D",..: 1 1 1 1 1 2 2 2 2 2 ...
 $ Yield: num  1545 1440 1440 1520 1580 ...

> summary(Dyestuff)
 Batch     Yield
 A:5   Min.   :1440
 B:5   1st Qu.:1469
 C:5   Median :1530
 D:5   Mean   :1528
 E:5   3rd Qu.:1575
 F:5   Max.   :1635

My format, on the other hand, is a matrix with 16 columns and many thousands rows, each row is one gene and each column is a sample (a standard RNA-seq input format). I know how to define experimental factors for this experimental design, but when I try to run this code:


# Load libraries.

# Loads counts file from e. g. RSEM
data.set <- read.table("genes.counts.matrix")
data.set <- round(data.set)

# Creates factors for A, B and batch effect
A <- factor(substring(colnames(data.set),1,6))
B <- factor(substring(colnames(data.set),7,10))
batch <- factor(substring(colnames(data.set),12,12))

fm01 <- lmer(data.set ~ A*B  + (1|batch))

I get this error message:

Error in model.frame.default(drop.unused.levels = TRUE, formula = data.set ~  :
  invalid type (list) for variable 'data.set'
Calls: lmer ... <Anonymous> -> eval -> eval -> model.frame -> model.frame.default
Execution halted

I suspect this is because the input format is wrong. Making it a data.frame() did not help. What are some productive approaches to this issue? Make a new table with log2 FC for B as the response variable (from, say, only those eight samples were A is unchanged) and calculate it from each batch independently?

batch rna-seq mixed model lmer • 1.7k views
ADD COMMENTlink modified 6 months ago by RamRS20k • written 3.1 years ago by Ekarl290
gravatar for Devon Ryan
3.1 years ago by
Devon Ryan88k
Freiburg, Germany
Devon Ryan88k wrote:

You'll end up needing to make a data frame with the relevant design and then mapply() a function with that to the matrix, since as is you're trying to fit the whole matrix in one go rather than fitting each row. Play around with one row of your matrix and get lmer to work with that and then you'll have a good starting point for scaling up to the full matrix.

As an aside, this is largely a waste of your time (your senior colleagues might want to look up how empirical Bayes methods are any some respects similar to what you end up doing in a mixed-effect model).

ADD COMMENTlink modified 6 months ago by RamRS20k • written 3.1 years ago by Devon Ryan88k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1071 users visited in the last hour