**20**wrote:

I am new to RNA-Seq analysis, and while I have read the EdgeR manual, several publications, and many biostars posts in order to inform my analysis plan, I would like to verify that I am understanding how to analyze the data properly before I begin. [I have limited knowledge of programming in R which is based almost entirely on these aforementioned sources].

We have three strains of animals (A,B,C) which were fed on two different diets (X, Y). Each strain x diet subgroup contains three animals, giving a total of 18 samples with three replicates for each strain x diet group.

We would like to look at the effect diet has on gene expression *within each strain*, (e.g. A.X vs A.Y) and the differences in gene expression *between strains* on the same diet (e.g. A.X vs B.X vs C.X)

**EDIT**:Since this is a two-way ANOVA design should I also include a test to determine if there is an interaction between strain and diet?*.

```
group <- c("A.X", "A.Y", "B.X", "B.Y", "C.X", "C.Y")
y <- DGEList(counts=count, group=group)
```

My first thought is to test if there are *any* differences in gene expression between any groups:

```
design <- model.matrix(~group)
colnames(design) <- levels(group)
fit <- glmQLFit(y, design)
topTags(glmQLFTest(fit, coef=2:6))
```

And then if there is, to proceed to conducting the pairwise tests:

```
design2 <- model.matrix(~0+group)
colnames(design2)<-levels(group)
fit2<-glmQLFit(y,design2)
con <- makeContrasts(Adiet = A.X-A.Y, Bdiet = B.X-B.Y, Cdiet = C.X-C.Y, XdietAB = A.X-B.X, YdietAB=A.Y-B.Y, etc)
Adiet.results <- topTags(glmQLFTest(fit2, contrast=con[,"Adiet"]))
```

After I conduct all the pairwise comparisons I would re-adjust the p-values, and add the adjusted p-values to the original results table and filter out the significant genes and place them in their own table:

```
all.pvals <- cbind(Adiet.results$table$PValue, Bdiet.results$table$PValue ...)
all.adjP <- p.adjust(all.pvals, method = "BH")
Adiet.results$table$AdjP <- all.adjP[,1]
Adiet.keep <- Adiet.results$table$AdjP <= 0.05
AdietDE <- Adiet.results[Adiet.keep,]
```

**EDIT**: Interaction between strain and diet:

```
strain <- factor(rep(c("A","B","C"),times=2))
diet <- factor(rep(c("X", "Y"), times=3))
design3 <- model.matrix(~strain*diet)
fit3 <- glmQLFit(y, design3)
topTags(glmQLFTest(fit3, coef=5:6))
```

In addition to asking if this is a reasonable design/analysis plan I had a couple additional questions:

- Within each strain, litter mates were chosen and placed on the two different diets. Is this a situation where blocking would be appropriate and if not, why not, and what would be an appropriate situation to incorporate blocking?
- Is it appropriate to do an ANOVA-like test on this dataset to detect if there are any differences between any of the groups? If I understand ANOVAs correctly, they only accurately detect differences if the data are normally distributed and have equal variances. Do the estimate dispersion steps (not shown but I will do) or any of the other steps prior to the actual testing (glmQLFtest()) account for this issue?
- When I go to correct the p-values for the multiple tests do I also need to re-correct the p-values for the ANOVA-like test to see if there are any differences between any of the groups? Or is that a stand alone test?

Thank you for all of your help, ahhn