Question

DESeq2 model design: dose and time effect

3

Entering edit mode

8.7 years ago

CuriusScientist ▴ 50

I have three samples each with 3 biological replicates i.e. one control and other at dose 12ug/ml and 25ug/ml at 24hr

> library('DESeq2')
> directory<-"./data"
> sampleFiles <- grep("Htseq",list.files(directory),value=TRUE)
> sampleCondition<-c("treated","treated","treated","treated","treated","treated","untreated","untreated","untreated")
> sampleGroup <-c ("A","A","A","B","B","B","C","C","C")
> sampleTable<-data.frame(sampleName=sampleFiles, fileName=sampleFiles, condition=sampleCondition, group=sampleGroup)
> sampleTable

    sampleName    fileName    condition      group
1    24h_12ug_1.count    24h_12ug_1.count    treated    A
2    24h_12ug_2.count    24h_12ug_2.count    treated    A
3    24h_12ug_3.count    24h_12ug_3.count    treated    A
4    24h_25ug_1.count    24h_25ug_1.count    treated    B
5    24h_25ug_2.count    24h_25ug_2.count    treated    B
6    24h_25ug_3.count    24h_25ug_3.count    treated    B
7    24h_ctrl_1.count    24h_ctrl_1.count    untreated    C
8    24h_ctrl_2.count    24h_ctrl_2.count    untreated    C
9    24h_ctrl_3.count    24h_ctrl_3.count    untreated    C

What should be good design if I want to see the differential expressed genes between treated vs untreated as well between different groups?

ddsHTSeq <- DESeqDataSetFromHTSeqCount(sampleTable=sampleTable, directory=directory, design=~group)

OR

ddsHTSeq<-DESeqDataSetFromHTSeqCount(sampleTable=sampleTable, directory=directory, design=~condition)

What will be the difference between the two designs?

I tried the following design but it gave me error

ddsHTSeq<-DESeqDataSetFromHTSeqCount(sampleTable=sampleTable, directory=directory, design=~group + condition + condition:group)
Error in DESeqDataSet(se, design = design, ignoreRank) : the model matrix is not full rank, so the model cannot be fit as specified. one or more variables or interaction terms in the design formula are linear combinations of the others and must be removed

How will the design will change if I add three samples each with 3 biological replicates i.e. one control and other at dose 12ug/ml and 25ug/ml at 6hr

RNA-Seq differential-expression deseq2 • 7.2k views

ADD COMMENT • link updated 18 months ago by Ram 43k • written 8.7 years ago by CuriusScientist ▴ 50

Ram · Accepted Answer · 2015-08-12

6

Entering edit mode

8.7 years ago

Devon Ryan 104k

Did you post this on SEQanswers too? I feel like I've seen it before.

If you want to look at treated vs. untreated and between group comparisons then just use ~group and use the appropriate contrast for the treated vs. untreated comparison.
The difference between ~group and ~condition is that in one case you're asking, "What's the difference due to treatment type?" while in the second case "What's the difference due to treating at all...ignoring that there are multiple treatments being used?" In reality, using ~condition will probably yield crappy results, since you have a batch effect that you're not compensating for.
You got that error because you can't simultaneously determine the effect of group C and the untreated condition...since they're the same. Similarly, the interaction terms are identical to the group terms, so they can't be determined either.
If you add more samples/conditions then the model will need to be changed according to the question(s) you want to ask. Given that you wanted inter-group comparisons before, I suspect you'll need to just add things as new groups and use appropriate contrasts for each question.

ADD COMMENT • link 8.7 years ago by Devon Ryan 104k

0

Entering edit mode

Yes, I posted at SEQanswers but was unaware of the fact that I can't delete that post.

Thanks for the reply which will lead to better understanding of DESeq2 model design

ADD REPLY • link updated 18 months ago by Ram 43k • written 8.7 years ago by CuriusScientist ▴ 50

0

Entering edit mode

I made the model according to following design

ddsHTSeq<-DESeqDataSetFromHTSeqCount(sampleTable=sampleTable, directory=directory, design=~group)
results(dds, contrast=c("group","B","C"))
results(dds, contrast=c("group","A","C"))

gave me results

However, when I tired contrast for trated vs untreated comparison it gave me an error

> results(dds, contrast=c("condition","treated","untreated"))
Error in cleanContrast(object, contrast, expanded = isExpanded, listValues = listValues,  :
  conditiontreated and conditionuntreated are expected to be in resultsNames(object)

When I tried resultsNames(object) I got following result which didn't have conditiontreated and conditionuntreated

> resultsNames(dds)
[1] "Intercept" "groupA"    "groupB"    "groupC"

Another query:

is results(dds, contrast=c("group","A","C")) EQUAL TO results(dds, contrast=c("group","C","A"))? As I know first value is numerator and second one is denominator. If not, then how to decide which one to have as numerator or denominator?

ADD REPLY • link updated 18 months ago by Ram 43k • written 8.7 years ago by CuriusScientist ▴ 50

0

Entering edit mode

To compare conditions you would compare the average of groups A and B with group C (a vector of c(0.5, 0.5, -1) would probably work). You can't use contrast=c("condition", ...) because condition isn't in your model.

Yes swapping A and C in the contrast will produce essentially the same results, just with the sign on the log fold change swapped (since you're swapping the numerator and denominator).

ADD REPLY • link updated 18 months ago by Ram 43k • written 8.7 years ago by Devon Ryan 104k

0

Entering edit mode

Thanks for the reply.

When I used results(dds, contrast=c(0.5,0.5,-1)) I got an error

Error in cleanContrast(object, contrast, expanded = isExpanded, listValues = listValues,  :
  numeric contrast vector should have one element for every element of 'resultsNames(object)'

But it worked with results(dds, contrast=c(0,0.5,0.5,-1))

log2 fold change (MAP): 0,+0.5,+0.5,-1
Wald test p-value: 0,+0.5,+0.5,-1

Is this right?

I didn't get why we used -1 instead of 1. can you explain more about it.

Now if sampleTable looks like

     sampleName          fileName            condition    group
1    24h_12ug_1.count    24h_12ug_1.count    treated      A
2    24h_12ug_2.count    24h_12ug_2.count    treated      A
3    24h_12ug_3.count    24h_12ug_3.count    treated      A
4    24h_25ug_1.count    24h_25ug_1.count    treated      B
5    24h_25ug_2.count    24h_25ug_2.count    treated      B
6    24h_25ug_3.count    24h_25ug_3.count    treated      B
7    24h_ctrl_1.count    24h_ctrl_1.count    untreated    C
8    24h_ctrl_2.count    24h_ctrl_2.count    untreated    C
9    24h_ctrl_3.count    24h_ctrl_3.count    untreated    C
10    6h_12ug_1.count    6h_12ug_1.count     treated      D
11    6h_12ug_2.count    6h_12ug_2.count     treated      D
12    6h_12ug_3.count    6h_12ug_3.count     treated      D
13    6h_25ug_1.count    6h_25ug_1.count     treated      E
14    6h_25ug_2.count    6h_25ug_2.count     treated      E
15    6h_25ug_3.count    6h_25ug_3.count     treated      E
16    6h_ctrl_1.count    6h_ctrl_1.count     untreated    F
17    6h_ctrl_2.count    6h_ctrl_2.count     untreated    F
18    6h_ctrl_3.count    6h_ctrl_3.count     untreated    F

The comparison of the groups will be done as previously

to compare the treated versus untreated will the following command be correct

results(dds, contrast=c(0,0.25,0.25,0.5,0.25,0.25,0.5))

Correct me if I am wrong. In the first scenario, when we had only one time and different dose, res will give differential expressed genes due to different treatment type. However, when in second case, when I added samples from 6 hour then res will give differential expressed genes across different treatment and time points. Moreover, we will use contrast if we want to compare different groups.

ADD REPLY • link updated 18 months ago by Ram 43k • written 8.7 years ago by CuriusScientist ▴ 50

0

Entering edit mode

I guess resultNames() includes the intercept, I didn't check. The values in the vector are coefficients. So to see if the average of A and B are different from see, we need to ask if (A/2)+(B/2)-C != 0. So, you have coefficients of 0.5, 0.5, and -1. The coefficients should always sum to 0 (otherwise you'll get wrong p-values).

The second contrast would then be c(0,0.25, 0.25, -0.5, 0.25, 0.25, -0.5), which corresponds to (A+B+D+E)/4 - (C+F)/2. You might want to consult/collaborate with a local statistician as your models get more complex as it's otherwise very easy to get confused by these.

ADD REPLY • link 8.7 years ago by Devon Ryan 104k

0

Entering edit mode

thanks for the reply.

now things are getting more clear. I can certainly collaborate with a statistician but I need to understand things myself.

will read "Statistical Models" by Freedman mentioned by you in order to get improve my stats.

thanks for all the help.

ADD REPLY • link updated 4.4 years ago by Ram 43k • written 8.7 years ago by CuriusScientist ▴ 50

0

Entering edit mode

Hello,

I have a nearly identical experimental design as imsharmanitin; 2 treatment groups and 2 batches, but one of the batches is exclusively one treatment.

sample    treatment    batch
3         control      A
4         control      A
5         control      B
6         control      B
7         control      B
13        control      B
14        control      B
15        BD           A
16        BD           A
17        BD           A
18        BD           A
18        BD           A

It is a poor experimental design, but unfortunately it is the data that I currently must work with. For Devon's point #2 above: I do want to see "What's the difference due to treating at all?" To account for my the potential batch effect of the B batch, I am using the design formula:

dds <- DESeqDataSetFromMatrix(countData = countData,colData = colData,design = ~ batch + treatment)
dds$treatment <- factor(dds$treatment, levels=c("control", "BD"))
dds$batch <- factor(dds$batch, levels=c("A", "B"))
dds <- DESeq(dds, full=design(dds), reduced = ~ batch)

The results give me many less DE genes than if I simply ignored the batches and only use ~ treatment. This makes sense, because according to PCA and clustering, there is a batch effect in my samples. I've read the DESeq2 manual and many posts, but am not a statistician and would love to hear feedback for the design I'm using here. Thank you!

ADD REPLY • link updated 4.4 years ago by Ram 43k • written 8.6 years ago by gracechappell ▴ 100