I've carried out a number of differential expression analyses using the Limma-Voom pipeline for a rather complicated data set. I'd just like to clarify that my approach detailed below is appropriate.
The design of the experiment is very much like those described in sections 9.6 and 9.7 of the Limma manual (time-course experiments and multi-level experiments).
Our data set comprises of RNA-Seq data from patients undertaking treatment with one of two drugs (drug A and drug B). For each subject in our data set, samples were collected at three time points from two different tissue types on the body. Samples from diseased tissue and healthy tissue were collected at week 0 (before commencing treatment), week 2 (two weeks after commencing treatment), and week 10 (ten weeks after commencing treatment).
In addition, each patient was genotyped for an allele of interest (we'll call it Allele.X), and designated as Allele.X-positive (1 or 2 copies of allele) or Allele.X-negative (0 copies of allele).
Therefore, a snippet of the data set looks something like this:
Subject Sample Drug Time Tissue Allele.X Subject.01 Sample.01 Drug.A Week.0 Diseased Positive Subject.01 Sample.02 Drug.A Week.2 Diseased Positive Subject.01 Sample.03 Drug.A Week.10 Diseased Positive Subject.01 Sample.04 Drug.A Week.0 Healthy Positive Subject.01 Sample.05 Drug.A Week.2 Healthy Positive Subject.01 Sample.06 Drug.A Week.10 Healthy Positive Subject.02 Sample.07 Drug.B Week.0 Diseased Negative Subject.02 Sample.08 Drug.B Week.2 Diseased Negative Subject.02 Sample.09 Drug.B Week.10 Diseased Negative Subject.02 Sample.10 Drug.B Week.0 Healthy Negative Subject.02 Sample.11 Drug.B Week.2 Healthy Negative Subject.02 Sample.12 Drug.B Week.10 Healthy Negative
In total, 35 patients were treated with drug A and 31 were treated with drug B.
In the drug A group, 20 were Allele.X-negative and 15 were were Allele.X-positive. In the drug B group, 18 were Allele.X-negative and 13 were Allele.X-positive.
I would like to identify genes that change in expression over the course of treatment in both tissue types in both drug groups, irrespective of Allele.X status, i.e: week 2 and week 10 compared to week 0 in ALL of the drug A group and ALL of the drug B group.
I would also like to do the above comparisons, but also taking into account Allele.X status.
My approach do this has been to analyse each drug group separately, so when looking at drug A for example, this is done by subsetting the clinical data and expression data...
drug = "Drug.A" clinical.data <- clinical.data[clinical.data$Drug == "Drug.A",] exp.data <- exp.data[, clinical.data$Sample]
...and creating an interaction variable that combines tissue, time, and Allele.X status...
clinical.data$Tissue.Time.Allele.X = paste(clinical.data$Tissue, clinical.data$Time, clinical.data$Allele.X, sep = ".")
...and an interaction variable that combines subject and tissue...
clinical.data$Subject.Tissue = paste(clinical.data$Subject, clinical.data$Tissue, sep = ".")
The Tissue.Time.Allele.X variable is used to define the design...
limma_design = model.matrix(~ 0 + Tissue.Time.Allele.X, data = keep_samp) voomed_counts = voom(edata, limma_design)
...and the Subject.Tissue variable is used to for sample blocking...
# Block correlation my_cor = duplicateCorrelation(voomed_counts, limma_design, block = clinical.data$Subject.Tissue) fit = lmFit(voomed_counts, limma_design, correlation = my_cor$cor, block = clinical.data$Subject.Tissue)
The makeContrasts function is used to extract the relevant contrasts...
# Extract contrasts of interest my_contrasts <- makeContrasts( Diseased_week2_vs_week0_All = ((15 * Diseased.Week.2.Positive + 20 * Diseased.Week.2.Negative) / 35) - ((15 * Diseased.Week.0.Positive + 20 * Diseased.Week.0.Negative) / 35), Diseased_week10_vs_week0_All = ((15 * Diseased.Week.10.Positive + 20 * Diseased.Week.10.Negative) / 35) - ((15 * Diseased.Week.0.Positive + 20 * Diseased.Week.0.Negative) / 35), Healthy_week2_vs_week0_All = ((15 * Diseased.Week.2.Positive + 20 * Diseased.Week.2.Negative) / 35) - ((15 * Diseased.Week.0.Positive + 20 * Diseased.Week.0.Negative) / 35), Healthy_week10_vs_week0_All = ((15 * Healthy.Week.10.Positive + 20 * Healthy.Week.10.Negative) / 35) - ((15 * Healthy.Week.0.Positive + 20 * Healthy.Week.0.Negative) / 35), Diseased_week2_vs_week0_Pos = Diseased.Week.2.Positive - Diseased.Week.0.Positive, Diseased_week2_vs_week0_Neg = Diseased.Week.2.Negative - Diseased.Week.0.Negative, Diseased_week10_vs_week0_Pos = Diseased.Week.10.Positive - Diseased.Week.10.Positive, Diseased_week10_vs_week0_Neg = Diseased.Week.10.Negative - Diseased.Week.10.Negative, Healthy_week2_vs_week0_Pos = Healthy.Week.2.Positive - Healthy.Week.0.Positive, Healthy_week2_vs_week0_Neg = Healthy.Week.2.Negative - Healthy.Week.0.Negative, Healthy_week10_vs_week0_Pos = Healthy.Week.10.Positive - Healthy.Week.10.Positive, Healthy_week10_vs_week0_Neg = Healthy.Week.10.Negative - Healthy.Week.10.Negative levels = limma_design ) # Compute contrasts fit <- contrasts.fit(fit, my_contrasts) fit <- eBayes(fit)
Hopefully it is clear that for the contrasts in all of the patients within the drug group, the average expression (weighted by sample number) of the Allele.X-positive and negative patients is being contrasted.
So my main question - is the above approach valid?
Thank you! And sorry for long post, but hopefully it is clear what I'm getting at.