Question: EdgeR model matrix for 2 plant genotypes with 2 treatments and 2 timepoints
gravatar for User 4014
14 days ago by
User 401440
User 401440 wrote:

Hi folks,

I am new in field of RNA-Seq and now trying to call DEGs from 2 plant genotypes (A -heat resistant and B-heat unresistant). I have 2 treatments (control vs treatment with 3 replicates for each, except one control I have only 2 replicates) and 2 timepoints (6 days and 12 days). Basically I would like to:

  1. Compare within and across timepoints for each genotype, e.g. A_treatment_day6 - A_control_day6 and (A_treatment_day12 - A_control_day12) - (A_treatment_day6 - A_control_day6)

  2. Compare within and across timepoints across genotypes (e.g. (A_treatment_day6 - A_control_day6I) - (B_treatment_day6 - B_control_day6)).

I am very confused whether it is better to build one matrix with genotypes, timepoints and treatment (~0+genotype+timepoint+treatment) and call DEGs from it, or it is better to define each treatment combination as a group (~0 + group; as in the section 3.3.1 of edgeR manual). Since the transcriptional responses can be quite different at 6 and 12 days, another thought I have is that perhaps it is better to work on each timepoint separately and treat each treatment combination as a group? But how can I avoid creating the (~0+genotype+timepoint+treatment) matrix for (2)?

May I share your thoughts what would be the best approach for this analysis? I read through edgeR manual a few times, but with my limited statistical knowledge I am not sure I understand it by heart. I greatly appreciate all suggestions and guidance. Thank you very much and I am looking forward to hearing from you.

rna-seq R • 171 views
ADD COMMENTlink modified 14 days ago by rpolicastro3.2k • written 14 days ago by User 401440
gravatar for rpolicastro
14 days ago by
Bloomington, IN
rpolicastro3.2k wrote:

A_treatment_day6 - A_control_day6 for this simple case it would be best to concatenate all of your factor levels into one term, such as A_treatment_day6. For the example we'll call this new factor level genotype_treatment_timepoint. Your formula would be ~ genotype_treatment_timepoint.

(A_treatment_day12 - A_control_day12) - (A_treatment_day6 - A_control_day6) and permutations of this in your question are answered by an interaction term, so for this concatenate your genotype and treatment into one factor (such as A_treatment, A_control) that we will call genotype_treatment, and then your formula would be ~ genotype_treatment + timepoints + genotype_treatment:timepoints, which the desired contrast being the interaction term genotype_treatment:timepoints.

ADD COMMENTlink modified 14 days ago • written 14 days ago by rpolicastro3.2k

Hi, Thank you very much for your help. It is very useful. I tried making a code for (A_treatment_day12 - A_control_day12) - (A_treatment_day6 - A_control_day6). Do you mind taking a look at it and let me know how to improve it please?

`rawCountTable <- read.table("counts.txt", header=TRUE, sep="\t", row.names=1)
sampleInfo4 <- read.table("sampleinfo4.csv", header=TRUE, sep=",", row.names=1)
genotype_treatment <- factor(sampleInfo4$genotype_treatment) 
genotype_treatment <- relevel(genotype_treatment, ref="A_control") 
timepoints <- factor(sampleInfo4$timepoints)
timepoints <- relevel(timepoints,ref="6")
Group <- factor(paste(sampleInfo4$genotype_treatment,sampleInfo4$timepoints,sep="."))
design <- model.matrix(~genotype_treatment * timepoints, data=y$samples)   # There are 4 coefficients here: [1] "(Intercept)", 
                [2] "genotype_treatmentA_treatment", [3] "timepoints9", [4] "genotype_treatmentA_treatment:timepoints9"
y <- DGEList(counts=rawCountTable[,1:11], group=Group)
keep <- filterByExpr(y)
y <- y[keep, , keep.lib.sizes=FALSE]
y <- calcNormFactors(y, method='TMM')
y <- estimateDisp(y, design, robust=TRUE)
fit <- glmQLFit(y, design)
qlf <- glmQLFTest(fit, coef=4)`

Still I have to figure out how to analyze (A_treatment_day6 - A_control_day6I) - (B_treatment_day6 - B_control_day6), but I guess I have to use each treatment combination as a group.

Thank you very much and have a great day!

ADD REPLYlink written 13 days ago by User 401440

(A_treatment_day6 - A_control_day6I) - (B_treatment_day6 - B_control_day6) would be another interaction term. Combine treatment and timepoint into one factor level (I'll call treatment_timepoint), and your formula would be ~ treatment_timepoint + genotype + treatment_timepoint:genotype. The coefficient of interest would again be your interaction term treatment_timepoint:genotype.

As for your code above I tend to use DESeq2 more, but as long as you selected the correct coefficient that code looks good otherwise.

ADD REPLYlink modified 13 days ago • written 13 days ago by rpolicastro3.2k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 948 users visited in the last hour