EdgeR model matrix for 2 plant genotypes with 2 treatments and 2 timepoints
Entering edit mode
4 months ago
User 4014 ▴ 40

Hi folks,

I am new in field of RNA-Seq and now trying to call DEGs from 2 plant genotypes (A -heat resistant and B-heat unresistant). I have 2 treatments (control vs treatment with 3 replicates for each, except one control I have only 2 replicates) and 2 timepoints (6 days and 12 days). Basically I would like to:

  1. Compare within and across timepoints for each genotype, e.g. A_treatment_day6 - A_control_day6 and (A_treatment_day12 - A_control_day12) - (A_treatment_day6 - A_control_day6)

  2. Compare within and across timepoints across genotypes (e.g. (A_treatment_day6 - A_control_day6I) - (B_treatment_day6 - B_control_day6)).

I am very confused whether it is better to build one matrix with genotypes, timepoints and treatment (~0+genotype+timepoint+treatment) and call DEGs from it, or it is better to define each treatment combination as a group (~0 + group; as in the section 3.3.1 of edgeR manual). Since the transcriptional responses can be quite different at 6 and 12 days, another thought I have is that perhaps it is better to work on each timepoint separately and treat each treatment combination as a group? But how can I avoid creating the (~0+genotype+timepoint+treatment) matrix for (2)?

May I share your thoughts what would be the best approach for this analysis? I read through edgeR manual a few times, but with my limited statistical knowledge I am not sure I understand it by heart. I greatly appreciate all suggestions and guidance. Thank you very much and I am looking forward to hearing from you.

RNA-Seq R • 251 views
Entering edit mode
4 months ago

A_treatment_day6 - A_control_day6 for this simple case it would be best to concatenate all of your factor levels into one term, such as A_treatment_day6. For the example we'll call this new factor level genotype_treatment_timepoint. Your formula would be ~ genotype_treatment_timepoint.

(A_treatment_day12 - A_control_day12) - (A_treatment_day6 - A_control_day6) and permutations of this in your question are answered by an interaction term, so for this concatenate your genotype and treatment into one factor (such as A_treatment, A_control) that we will call genotype_treatment, and then your formula would be ~ genotype_treatment + timepoints + genotype_treatment:timepoints, which the desired contrast being the interaction term genotype_treatment:timepoints.

Entering edit mode

Hi, Thank you very much for your help. It is very useful. I tried making a code for (A_treatment_day12 - A_control_day12) - (A_treatment_day6 - A_control_day6). Do you mind taking a look at it and let me know how to improve it please?

`rawCountTable <- read.table("counts.txt", header=TRUE, sep="\t", row.names=1)
sampleInfo4 <- read.table("sampleinfo4.csv", header=TRUE, sep=",", row.names=1)
genotype_treatment <- factor(sampleInfo4$genotype_treatment) 
genotype_treatment <- relevel(genotype_treatment, ref="A_control") 
timepoints <- factor(sampleInfo4$timepoints)
timepoints <- relevel(timepoints,ref="6")
Group <- factor(paste(sampleInfo4$genotype_treatment,sampleInfo4$timepoints,sep="."))
design <- model.matrix(~genotype_treatment * timepoints, data=y$samples)   # There are 4 coefficients here: [1] "(Intercept)", 
                [2] "genotype_treatmentA_treatment", [3] "timepoints9", [4] "genotype_treatmentA_treatment:timepoints9"
y <- DGEList(counts=rawCountTable[,1:11], group=Group)
keep <- filterByExpr(y)
y <- y[keep, , keep.lib.sizes=FALSE]
y <- calcNormFactors(y, method='TMM')
y <- estimateDisp(y, design, robust=TRUE)
fit <- glmQLFit(y, design)
qlf <- glmQLFTest(fit, coef=4)`

Still I have to figure out how to analyze (A_treatment_day6 - A_control_day6I) - (B_treatment_day6 - B_control_day6), but I guess I have to use each treatment combination as a group.

Thank you very much and have a great day!

Entering edit mode

(A_treatment_day6 - A_control_day6I) - (B_treatment_day6 - B_control_day6) would be another interaction term. Combine treatment and timepoint into one factor level (I'll call treatment_timepoint), and your formula would be ~ treatment_timepoint + genotype + treatment_timepoint:genotype. The coefficient of interest would again be your interaction term treatment_timepoint:genotype.

As for your code above I tend to use DESeq2 more, but as long as you selected the correct coefficient that code looks good otherwise.


Login before adding your answer.

Traffic: 1519 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6