I have an RNA-seq dataset with two treatments (A and B) and four environments in a classic +/+, +/-, -/+, -/- design with four replicates for environment for a total of 16 samples. There are also four batches: 1 replicate of each environment occurs in the first batch, 2 replicate of each environment in second batch and so on.
So far, I have used the glm methods of EdgeR with the model ~batch+A*B and gotten out the effects of A and B and interactions.
However, more senior researchers wants me to provide them with results for a mixed model approach and use the lmer() function from the lme4 package to model the batch as a random factor instead. Their concern was, as I have understood it, that the effects of B might be drowned out if batch is treated as a fixed factor. So they want to find how much variation in B contributes to the observed expression values.
From reading the manuals, vignettes and tutorials, I understand the basic idea of the lmer() function on their example datasets, but the input format seems widely different from mine.
> head(Dyestuff) Batch Yield 1 A 1545 2 A 1440 3 A 1440 4 A 1520 5 A 1580 6 B 1540 > str(Dyestuff) 'data.frame': 30 obs. of 2 variables: $ Batch: Factor w/ 6 levels "A","B","C","D",..: 1 1 1 1 1 2 2 2 2 2 ... $ Yield: num 1545 1440 1440 1520 1580 ... > summary(Dyestuff) Batch Yield A:5 Min. :1440 B:5 1st Qu.:1469 C:5 Median :1530 D:5 Mean :1528 E:5 3rd Qu.:1575 F:5 Max. :1635
My format, on the other hand, is a matrix with 16 columns and many thousands rows, each row is one gene and each column is a sample (a standard RNA-seq input format). I know how to define experimental factors for this experimental design, but when I try to run this code:
#!/usr/bin/Rscript # Load libraries. library(lme4) # Loads counts file from e. g. RSEM data.set <- read.table("genes.counts.matrix") data.set <- round(data.set) # Creates factors for A, B and batch effect A <- factor(substring(colnames(data.set),1,6)) B <- factor(substring(colnames(data.set),7,10)) batch <- factor(substring(colnames(data.set),12,12)) fm01 <- lmer(data.set ~ A*B + (1|batch))
I get this error message:
Error in model.frame.default(drop.unused.levels = TRUE, formula = data.set ~ : invalid type (list) for variable 'data.set' Calls: lmer ... <Anonymous> -> eval -> eval -> model.frame -> model.frame.default Execution halted
I suspect this is because the input format is wrong. Making it a
data.frame() did not help. What are some productive approaches to this issue? Make a new table with log2 FC for B as the response variable (from, say, only those eight samples were A is unchanged) and calculate it from each batch independently?