Question

Design for complex RNA-Seq experiment using Deseq2

0

Entering edit mode

10 weeks ago

Ezequiel • 0

I would like to reach to the community to get some input regarding a particular experimental design that is troubling me for RNA-Seq analysis. I will discuss this with a statistician, but I also want to make sure that I am modeling it right in DESeq (edgeR or limma for that matter).

All patients are taken a biopsy at baseline of the tumor (Time=PRE), afterwards are randomized to TREATMENT or CONTROL. After some time another sample is collected (Time: POST). I want to compare the effect of the Treatment, accounting for the baseline of each patient.I have followed the DESEQ2 vignette Section: "Group-specific condition effects, individuals nested within groups". Briefly:

metadata <- data.frame(
  SAMPLE = c("PATIENT6_PRE", "PATIENT6_POST", "PATIENT12_PRE", "PATIENT12_POST",
             "PATIENT5_PRE", "PATIENT5_POST", "PATIENT8_PRE", "PATIENT8_POST",
             "PATIENT15_PRE", "PATIENT15_POST", "PATIENT16_PRE", "PATIENT16_POST",
             "PATIENT17_PRE", "PATIENT17_POST", "PATIENT19_PRE", "PATIENT19_POST"),
  Time = c("PRE", "POST", "PRE", "POST", "PRE", "POST", "PRE", "POST", "PRE", "POST",
           "PRE", "POST", "PRE", "POST", "PRE", "POST"),
  Patient = c("PATIENT6", "PATIENT6", "PATIENT12", "PATIENT12", "PATIENT5", "PATIENT5",
              "PATIENT8", "PATIENT8", "PATIENT15", "PATIENT15", "PATIENT16", "PATIENT16",
              "PATIENT17", "PATIENT17", "PATIENT19", "PATIENT19"),
  TUMOR_TYPE = c("A", "A", "B", "B", "C", "C", "C", "C", "D", "D", "B", "B", "D", "D", "B", "B"),
  Intervention = c("TREATMENT", "TREATMENT", "TREATMENT", "TREATMENT", "TREATMENT", "TREATMENT",
                   "CONTROL", "CONTROL", "TREATMENT", "TREATMENT", "CONTROL", "CONTROL",
                   "TREATMENT", "TREATMENT", "CONTROL", "CONTROL"),
  ind.n = c(1, 1, 2, 2, 3, 3, 1, 1, 4, 4, 2, 2, 5, 5, 3, 3)
)

metadata at a glipmse

dds<- DESeqDataSetFromMatrix(countData = round(data),
  colData = metadata,
  design = ~Patient + Time)

design <- model.matrix(~ Intervention + Intervention:ind.n + Intervention:Time, data = metadata)
design_clean <- design[,!colSums(design)==0]
dds <- DESeq(dds, full= design_clean, betaPrior = FALSE)
resultsNames(dds)

[1] "Intercept"              "InterventionCONTROL"                 "InterventionTREATMENT.ind.n2"          "InterventionCONTROL.ind.n2"          "InterventionTREATMENT.ind.n3"          "InterventionCONTROL.ind.n3"         
 [7] "InterventionTREATMENT.ind.n4"          "InterventionTREATMENT.ind.n5"          "InterventionTREATMENT.POST"   "InterventionCONTROL.POST"

res <- results(dds, contrast = list(c("InterventionTREATMENT.TissueResection","InterventionCONTROL.TissueResection")))

If this is correct I would be comparing the effect of Treatment, accounting for the baseline for each patient?. I also wanted to account for the tumor type but, since I am already "blocking" the Patient term, would this be necessary?. I would really appreciate any input on how to analyze this design! Thanks a lot!.

Model.Matrix RNA-Seq Limma DESeq2 edgeR • 439 views

ADD COMMENT • link 9 weeks ago by Ezequiel • 0

score 0 · Answer 1 · 2024-05-17

0

Entering edit mode

10 weeks ago

swbarnes2 14k

This is a rather small experiment, I don't know that you can establish a meaningful baseline for each patient with a single pre sample, and I don't think you have the power to model patient and tumor type and time point all at the same time.

What does the PCA tell you are the most important drivers of difference between all the samples? Because if the tumors are rather different from each other, then I'm not sure there's much point to combining them all, and then you have 4 studies that are too small to be worth anything.

ADD COMMENT • link 10 weeks ago by swbarnes2 14k

0

Entering edit mode

I completely agree with most points. The PCA shows that the biggest driver of difference is the patients themselves, with PRE and POST timepoints for each patient clustering closer to each other. I also should point out that the "tumor type" is actually different subtypes of the same cancer. I am still interested to see if the treatment produced a common gene signature or some common changes among these cases. Is there any particular model you think I could you to tease this out? Thanks a lot!

ADD REPLY • link 9 weeks ago by Ezequiel • 0