Multiple factors in DESeq2
2
2
Entering edit mode
9 days ago

Hi everyone,

I have an RNA-seq experiment with the following factors - $timepoint (0,2,6,12), $outcome (complete, incomplete), and patient (1:10)

I want to make the following plots to ask the questions

  1. Does the gene SCAR1 differ between weeks 0 and 2 across all samples.
  2. In week 0, is SCAR1 expression different between complete and incomplete responders.
  3. In week 2, is SCAR1 expression different between complete and incomplete responders.

enter image description here

Here is an example dataframe:

data = data.frame(
  counts =  rnorm(40, mean = 100, sd = 10),

  timepoint = c(
    rep('0',10),
    rep('2',10),
    rep('6',10),
    rep('12',10)
    ),
  outcome = rep(c(rep('complete',5), rep('incomplete',5)), 4),
  patient = rep(
    rep(c('1','2','3','4','5','6','7','8','9','10'), 1), 4
  )
)

How can I use DESeq2 for this? I am really struggling to get my head around the use of the design formula and contrasts. I've read the docs and read other forum questions and I still can't understand it. I'd greatly appreciate any help at all.

deseq2 • 511 views
ADD COMMENT
0
Entering edit mode

The closest I have in my mind is the following:

dds <- DESeqDataSetFromMatrix(
  countData = em,
  colData = meta,
  design = ~ timepoint + outcome + timepoint:outcome
)
dds <- DESeq(dds)

#compare COMPLETE vs INCOMPLETE in week 0
results_outcome_week0 <- results(dds, contrast = list(
  c("outcome0", "complete.outcome0")
))

#compare COMPLETE vs INCOMPLETE in week 2
results_outcome_week2 <- results(dds, contrast = list(
  c("outcome2", "complete.outcome2")
))

However, I don't understand whether this result is written properly. This design also does not correct for patient id of each sample.

Thank you for any help.

ADD REPLY
1
Entering edit mode
2 days ago

Just an update, DESeq2 works using GLM and cannot consider random factors like patient ID when patient ID groups are unique to outcome factor (patient 1 is only in complete group for example and never in incomplete). Therefore I can't make plot 2 above using DESeq2 while accounting for patient ID. Instead I am using the package GLMMSeq which uses mixed effect models to account for patient ID. Then I pull out genes with an interaction at week 2 since that is my research question. With this package you can make plots like the one below. Thanks everyone for your help!

enter image description here

ADD COMMENT
3
Entering edit mode
9 days ago

First note that your example data contains data for only a single gene, and DESeq requires you to have data from a large number of genes to work properly.

If what you really want is to ask the three questions you've posted, then the solution is easy - simply break it down into the three separate two condition tests, and test them separately:

day0_v_day2_counts <- em[,meta$day %in% c('0','2')]

dds_day <- DESeqDataSetFromMatrix(
  countData = em,
  colData = meta[,meta$day %in% c('0','2')],
  design = ~ timepoint 
)
dds_day <- DESeq(dds_day)
   dds_day_results <- results(dds_day)


dds_day0 <- DESeqDataSetFromMatrix(
    countData = em[, meta$day==0],
   colData = meta[, meta$day==0],
  design= ~outcome)
dds_day0 <- DEseq(dds_day0)
dds_day0_results <- results(dds_day)

and so on.

However, if you question is more "Is the difference between complete and incomplete larger at Day2 than Day0, which I bet it is, then you should do:

day0_v_day2_counts <- em[,meta$day %in% c('0','2')]
dds_interation <- DEseqDataSetFromMatrix(
    countData =  em[,meta$day %in% c('0','2')],
   colData = meta[, meta$day %in% c('0','2')],
   design = ~ timepoint + outcome + timepoint:outcome)
dds_interaction <- DESeq(dds_interaction, test="LRT", reduced = ~ timepont + outcome)
dds_interaction_results <- results(dds_interaction)

The outcome:timepoint part of the design telling you how different the effect of outcome is at day2 than day0.

ADD COMMENT
0
Entering edit mode

Hi Ian @i.sudbery, Thank you for your reply, I appreciate your time and comprehensive advice.

Regarding your first suggestion in which you subset for week 0 and week 2 - if I am making the first plot, showing the gene counts across all timepoints along with the week0 week2 comparison, is it valid if the normalised counts for plotting originate from a non-subsetted dds of all timepoints (for example: norm_counts = normTransform(dds_all_timepoints) , while the week0vs2 statistical result shown in the plot comes from a week0&2 subsetted dds as you showed above? Said another way, can you indicate statistical significance calculated from a subsetted dds beside count data generated by another dds? I get the feeling that if I show counts, any stats comparisons should be generated from the same dds. I hope that makes sense

Thanks again.

ADD REPLY
0
Entering edit mode

Also, I'd love to hear your thoughts on this strategy. new group column added so that week0vs2 comparison can be pulled out easily with results, while also adjusting for patient matching. Thanks again

#new group column
meta$timepoint_outcome = paste0(meta$Timepoint, meta$outcome)

#DESeq2
dds_grouped <- DESeqDataSetFromMatrix(
  countData = em,
  colData = meta,
  design = ~ timepoint_outcome + patient 
)
dds_grouped <- DEseq(dds_grouped)

#get upregulated genes in INCOMPLETE
res_0 <- results(dds, contrast = c("timepoint_outcome", "0incomplete", "0complete"))
res_2 <- results(dds, contrast = c("timepoint_outcome", "2incomplete", "2complete"))
ADD REPLY
0
Entering edit mode

Yes, this should also work. Its probably better to do it this way, if you are using the normalised counts from the full dataset for plotting.

ADD REPLY

Login before adding your answer.

Traffic: 2060 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6