Question: Limma time-course: how to apply differential contrast if time zero is considered both as treatment and control?
3
2.5 years ago by
harelarik50
harelarik50 wrote:

Summary: To find which genes respond to 1 hr cold stress we would like to use the "differential contrast" suggested in page 47 of the limma user guide. However it is not trivial, as in our experiment time zero is considered both as "time zero none stress", and "time zero cold stress".**

Goal: We are using limma to analyze time course experiment with 3 time points, and one treatment vs. control. We would like to test this combination of time course and one cold stress as follows: Amb_0hr (no stress time zero), Amb_1hr, Amb_6hr, Cld_1hr (cold stress after 1 hour), Cld_6hr.

Experiment description: we started with 5 flasks of bacteria; extracted RNA for time zero from one of them; applied cold stress only on two of them; extracted RNA after 1hr from one stressed and one control flasks; extracted RNA after 6 hours form one stressed and one control flasks (everything was done in triplicates).

Issue": If we want to find which genes respond to 1 hr cold stress we would like to use the "differential contrast" suggested in page 47 of the user guide. While it is straightforward for 6hr cold stress (i.e., makeContrasts( Dif_6hr=(Cld _6hr- Cld _1hr)-(Amb_6hr-Amb_1hr), levels=design) it is not trivial for 1hr cold stress. The reason for that is that, in our experiment time zero is considered both as "time zero none stress", and "time zero cold stress".

Considering this. Coluld you please suggest what is the best way to build differential contrast to 1hr hour cold stress?

Potential solutions:

A. For 1 hour cold stress:

I was thinking of using this contrast:

cont.wt <- makeContrasts( "Cld _1hr-Amb_0hr", "Amb_1hr-Amb_0hr", "Cld _6hr- Cld _1hr", "Amb_6hr-Amb_1hr", levels=design)

And then to substract the list of over expressed genes found in "Amb_1hr-Amb_0hr" from the list of genes found by "Cld _1hr-Amb_0hr" (same is done for under expressed).

Is this a correct approach?

B. For 6 hour cold stress:

I have two options:

a. Using the same contrast used for 1hr cold stress above, and then to subsutract the list of over expressed genes found by "Amb_6hr-Amb_1hr" from the list of overexpressed genes found by "Cld _6hr- Cld _1hr".

OR

b. Using the differential contrast based on page 47 of the user guide:

cont.wt <- makeContrasts( Dif_6hr=(Cld _6hr- Cld _1hr)-(Amb_6hr-Amb_1hr), levels=design)

Which one is better for 6 hours?

WHAT IS THE BEST CONTRATS TO ASK "WHICH GENES RESPOND AT EITHER 1 HOUR OR 6 HOUR OF COLD STRESS?

Thank you.

Detailed ANALYSIS:

1. Data taken from RNA-seq experiment.

2. Design matrix (two time points :1,6 hr; treatments: control, coldStress, triplicate for each treatment):

Amb_0hr Amb_1hr Amb_6hr Cld_1hr Cld_6hr

1 1 0 0 0 0

2 1 0 0 0 0

3 1 0 0 0 0

4 0 1 0 0 0

5 0 1 0 0 0

6 0 1 0 0 0

7 0 0 1 0 0

8 0 0 1 0 0

9 0 0 1 0 0

10 0 0 0 1 0

11 0 0 0 1 0

12 0 0 0 1 0

13 0 0 0 0 1

14 0 0 0 0 1

15 0 0 0 0 1

1. Commands used:

lev<-c("Amb_0hr","Amb_1hr","Amb_6hr","Cld_1hr","Cld_6hr")

all_groups_repeats<-c("Amb_0hr", "Amb_0hr", "Amb_0hr", "Amb_1hr", "Amb_1hr", "Amb_1hr", "Amb_6hr", "Amb_6hr", "Amb_6hr", "Cld_1hr", " Cld _1hr", " Cld _1hr", " Cld _6hr", " Cld _6hr", " Cld _6hr")

f <- factor(all_groups_repeats, levels=lev)

design <- model.matrix(~0+f)

colnames(design) <- lev

fit <- lmFit(eset, design)

v <- voom(xCurrent, design, plot=FALSE)#Voom fit

fit <- lmFit(v , design)

Contrast used:

cont.wt <- makeContrasts(

Dif1_hr =(Cld _1hr-Amb_0hr)-(Amb_1hr-Amb_0hr),

Dif_6hr=(Cld _6hr-Cld _1hr)-(Amb_6hr-Amb_1hr),

levels=design)

fit2 <- contrasts.fit(fit, cont.wt)

fit2 <- eBayes(fit2)

Final DE analysis

wt<-decideTests(fit2,p.value = 0.05, adjust.method = "BH", method = "global")

summary(wt)#Produce summary table

limma time course • 1.8k views
modified 2.5 years ago by Kevin Blighe59k • written 2.5 years ago by harelarik50
1
2.5 years ago by
Kevin Blighe59k
Kevin Blighe59k wrote:

Hi harelarik,

From my experience of using limma, it's best to keep the model formula as simple as possible. Well, nothing against limma, really, as an overly complex formula in any regression analysis can produce unreliable information. One of my main worries in your situation is the low sample number.

Edit (October 26, 2019): with a complex formula, it can also be difficult to interpret the meaning of the results; whereas, the biological meaning of results produced from a more basic formula is, well, easier to interpret.

Given the relatively complex set-up and the low sample number, I believe you will find it extremely difficult to arrive at just a single list of statistical values and be able to say: 'This is my solution for 6 hour cold stress', et cetera.

I would begin by critically thinking about what each comparison is actually showing. For example, `Cld _6hr-Cld _1hr` will show me genes that are statistically significantly differentially expressed between the 1 hour and 6 hour time-points in the cold-stress samples, whereas `Amb_6hr-Amb_1hr` will show me the same but at ambient / room temperature. If I then compare these two listings (even by eye), I will gain insights into genes that are activated or repressed when cold-stress is applied.

Then do the same for 1 and 0 hours, and 6 and 0 hours. You will end up with 6 different gene listings, each showing various things.

## --------------------------------

By the way, in addition and considering that you have bacterial RNA-seq, you may be interested in taking a look at Rockhopper. It runs quickly and will return FASTA sequences that are statistically differentially expressed (and provide P and Q values). You can then infer functionality of these sequences via blastx.

Kevin

Thank you very much, Kevin! We have 3 biological repeats per samples so: 21 samples.

Cld _6hr-Cld _1hr will show me genes that are statistically differentially expressed between the 1 hour and 6 hour time-points in the >cold-stress samples, whereas Amb_6hr-Amb_1hr will show me the same but at ambient / room temperature. If I then compare >these two listings (even by eye), I will gain insights into genes that are activated or repressed when cold-stress is applied.

That was my initial thought. But then I saw the "differential contrast" suggested in page 47 of the limma user guide, and I thought it might have some advantage. I.e., is using: Dif_6hr=(Cld _6hr- Cld _1hr)-(Amb_6hr-Amb_1hr), better then what you suggested above?? The approach you have suggeted, sounds like the same thing, but I do not get the same number of differentially experssed genes in the final result.

Yes, you will not get the exact same results because the model comparisons are different. The way that I suggested just produces statistics for each pairwise comparison, whereas your model, `Dif_6hr=(Cld _6hr- Cld _1hr)-(Amb_6hr-Amb_1hr)`, is attempting to do a further comparison within the model. I just feel that this is too complex for the low numbers of samples that you have, and will produce unreliable results.

The way that I and others with whom I've worked have done these types of studies is through just the simple pairwise comparisons, looking over the results, and then piecing together a storyline based on these.

Hi Kevin,

I am after the exact same answer. Given that I would have enough samples, would the contrast mentioned above `(Dif_6hr=(Cld _6hr- Cld _1hr)-(Amb_6hr-Amb_1hr))` be correct?

This is an example of my own dataframe:

``````df <- data.frame(timepoint = rep_len(0:1, length.out=16),
subjectID = c(1,1,2,2,3,3,4,4,5,5,6,6,7,7,8,8),
group = c(rep("C", 8), rep("N", 8)))
``````

As you can see I have 6 subjects (more in my dataset), each one with two time points measurements. I am interested in knowing if my gene expression is statistically different between the two groups X and Y (in my case either control or treatment)

That formula seems okay for the purposes of gauging differences between Cold and Ambient with respect to time. Please take a look at Aaron's answer here, too: https://support.bioconductor.org/p/117545/#117557

Thank you Kevin, yes that post is indeed very similar to what I need to do. I have been fiddling with the code further and given that my data is slightly different I decided to post my concerns. Here, if you could please help: Limma/edgeR or other for metagenomic (non-RNA) data?

Very interesting question @harelarik, I'm facing the same problem right now. In case you had enough samples, would it be correct the makeContrasts( Dif_1hr=(Cld _1hr- Amb_0hr)-(Amb_1hr-Amb_0hr), levels=design) approach for checking the genes implicated in the 1 hour cold stress response?

Yes, that will be fine. However, if `Amb_0hr` is your reference level, then your contrast is the exact same as the following:

``````makeContrasts(Dif_1hr = Cld _1hr - Amb_1hr, levels=design)
``````