Question: DESeq2 model design: dose and time effect
2
gravatar for Nitin Sharma
3.6 years ago by
Nitin Sharma30
United Kingdom
Nitin Sharma30 wrote:

i have three samples each with 3 biological replicates i.e. one control and other at dose 12ug/ml and 25ug/ml at 24hr

> library('DESeq2')

> directory<-"./data"

> sampleFiles <- grep("Htseq",list.files(directory),value=TRUE)

> sampleCondition<-c("treated","treated","treated","treated","treated","treated","untreated","untreated","untreated")

> sampleGroup <-c ("A","A","A","B","B","B","C","C","C")

> sampleTable<-data.frame(sampleName=sampleFiles, fileName=sampleFiles, condition=sampleCondition, group=sampleGroup)

> sampleTable

    sampleName    fileName    condition      group
1    24h_12ug_1.count    24h_12ug_1.count    treated    A
2    24h_12ug_2.count    24h_12ug_2.count    treated    A
3    24h_12ug_3.count    24h_12ug_3.count    treated    A
4    24h_25ug_1.count    24h_25ug_1.count    treated    B
5    24h_25ug_2.count    24h_25ug_2.count    treated    B
6    24h_25ug_3.count    24h_25ug_3.count    treated    B
7    24h_ctrl_1.count    24h_ctrl_1.count    untreated    C
8    24h_ctrl_2.count    24h_ctrl_2.count    untreated    C
9    24h_ctrl_3.count    24h_ctrl_3.count    untreated    C

what should be good design if i want to see the differential expressed genes between treated vs untreated as well between different groups ?

ddsHTSeq<-DESeqDataSetFromHTSeqCount(sampleTable=sampleTable, directory=directory, design=~group)
                                           OR
ddsHTSeq<-DESeqDataSetFromHTSeqCount(sampleTable=sampleTable, directory=directory, design=~condition)

what will be the difference between the two designs?


I tried the following design but it gave me error

ddsHTSeq<-DESeqDataSetFromHTSeqCount(sampleTable=sampleTable, directory=directory, design=~group + condition + condition:group)

Error in DESeqDataSet(se, design = design, ignoreRank) : the model matrix is not full rank, so the model cannot be fit as specified. one or more variables or interaction terms in the design formula are linear combinations of the others and must be removed


how will the design will change if i add three samples each with 3 biological replicates i.e. one control and other at dose 12ug/ml and 25ug/ml at 6hr

ADD COMMENTlink modified 3.6 years ago by Devon Ryan88k • written 3.6 years ago by Nitin Sharma30
6
gravatar for Devon Ryan
3.6 years ago by
Devon Ryan88k
Freiburg, Germany
Devon Ryan88k wrote:

Did you post this on SEQanswers too? I feel like I've seen it before.

  1. If you want to look at treated vs. untreated and between group comparisons then just use ~group and use the appropriate contrast for the treated vs. untreated comparison.
  2. The difference between ~group and ~condition is that in one case you're asking, "What's the difference due to treatment type?" while in the second case "What's the difference due to treating at all...ignoring that there are multiple treatments being used?" In reality, using ~condition will probably yield crappy results, since you have a batch effect that you're not compensating for.
  3. You got that error because you can't simultaneously determine the effect of group C and the untreated condition...since they're the same. Similarly, the interaction terms are identical to the group terms, so they can't be determined either.
  4. If you add more samples/conditions then the model will need to be changed according to the question(s) you want to ask. Given that you wanted inter-group comparisons before, I suspect you'll need to just add things as new groups and use appropriate contrasts for each question.
ADD COMMENTlink written 3.6 years ago by Devon Ryan88k

yes, i posted at SEQanswers but was unaware of the fact that i can't delete that post. 

thanks for the reply which will lead to better understanding of DESeq2 model design

ADD REPLYlink written 3.6 years ago by Nitin Sharma30

i made the model according to following design

ddsHTSeq<-DESeqDataSetFromHTSeqCount(sampleTable=sampleTable, directory=directory, design=~group)
 

results(dds, contrast=c("group","B","C"))

results(dds, contrast=c("group","A","C"))

gave me results

However, when i tired contrast for trated vs untreated comparison it gave me an error

> results(dds, contrast=c("condition","treated","untreated"))
Error in cleanContrast(object, contrast, expanded = isExpanded, listValues = listValues,  :
  conditiontreated and conditionuntreated are expected to be in resultsNames(object)

when  i tried resultsNames(object) i got following result which didn't have conditiontreated and conditionuntreated

> resultsNames(dds)
[1] "Intercept" "groupA"    "groupB"    "groupC"

 

another query :

is results(dds, contrast=c("group","A","C"))   EQUAL TO results(dds, contrast=c("group","C","A")) ?  as i know first value is numerator and second one is denominator. If not, then how to decide which one to have as numerator or denominator? 

 

ADD REPLYlink written 3.6 years ago by Nitin Sharma30

To compare conditions you would compare the average of groups A and B with group C (a vector of c(0.5, 0.5, -1) would probably work). You can't use contrast=c("condition", ...) because "condition" isn't in your model.

Yes swapping A and C in the contrast will produce essentially the same results, just with the sign on the log fold change swapped (since you're swapping the numerator and denominator).
 

ADD REPLYlink written 3.6 years ago by Devon Ryan88k

thanks for the reply.

When i used results(dds, contrast=c(0.5,0.5,-1)) I got an error

Error in cleanContrast(object, contrast, expanded = isExpanded, listValues = listValues,  :
  numeric contrast vector should have one element for every element of 'resultsNames(object)'

But it worked with  results(dds, contrast=c(0,0.5,0.5,-1))
log2 fold change (MAP): 0,+0.5,+0.5,-1
Wald test p-value: 0,+0.5,+0.5,-1

is this right?

i didn't get why we used -1 instead of 1. can you explain more about it.

Now if the sampleTable look like

   sampleName     fileName   condition group
1 24h_12ug_1.count 24h_12ug_1.count treated A
2 24h_12ug_2.count 24h_12ug_2.count treated A
3 24h_12ug_3.count 24h_12ug_3.count treated A
4 24h_25ug_1.count 24h_25ug_1.count treated B
5 24h_25ug_2.count 24h_25ug_2.count treated B
6 24h_25ug_3.count 24h_25ug_3.count treated B
7 24h_ctrl_1.count 24h_ctrl_1.count untreated C
8 24h_ctrl_2.count 24h_ctrl_2.count untreated C
9 24h_ctrl_3.count 24h_ctrl_3.count untreated C
10 6h_12ug_1.count 6h_12ug_1.count treated D
11 6h_12ug_2.count 6h_12ug_2.count treated D
12 6h_12ug_3.count 6h_12ug_3.count treated D
13 6h_25ug_1.count 6h_25ug_1.count treated E
14 6h_25ug_2.count 6h_25ug_2.count treated E
15 6h_25ug_3.count 6h_25ug_3.count treated E
16 6h_ctrl_1.count 6h_ctrl_1.count untreated F
17 6h_ctrl_2.count 6h_ctrl_2.count untreated F
18 6h_ctrl_3.count 6h_ctrl_3.count untreated F

The comparison of the groups will be done as previously 

to compare the treated versus untreated will the following command be correct 

 results(dds, contrast=c(0,0.25,0.25,0.5,0.25,0.25,0.5))

 

Correct me if I am wrong. In the first scenario, when we had only one time and different dose, 'res' will give differential expressed genes due to different treatment type. However, when in second case, when I added samples from 6 hour then 'res' will give differential expressed genes across different treatment and time points. Moreover, we will use contrast if we want to compare different groups.  

ADD REPLYlink written 3.6 years ago by Nitin Sharma30

I guess resultNames() includes the intercept, I didn't check. The values in the vector are coefficients. So to see if the average of A and B are different from see, we need to ask if (A/2)+(B/2)-C != 0. So, you have coefficients of 0.5, 0.5, and -1. The coefficients should always sum to 0 (otherwise you'll get wrong p-values).

The second contrast would then be c(0,0.25, 0.25, -0.5, 0.25, 0.25, -0.5), which corresponds to (A+B+D+E)/4 - (C+F)/2. You might want to consult/collaborate with a local statistician as your models get more complex as it's otherwise very easy to get confused by these.

ADD REPLYlink written 3.6 years ago by Devon Ryan88k

thanks for the reply. 

now things are getting more clear.  I can certainly collaborate with a statistician but I need to understand things myself.  

will read  "Statistical Models" by Freedman mentioned by you in order to get improve my stats.

thanks for all the help.

ADD REPLYlink written 3.6 years ago by Nitin Sharma30

Hello,

I have a nearly identical experimental design as imsharmanitin; 2 treatment groups and 2 batches, but one of the batches is exclusively one treatment. 

sample    treatment    batch
3    control    A
4    control    A
5    control    B
6    control    B
7    control    B
13  control    B
14  control    B
15   BD    A
16   BD    A
17   BD    A
18   BD    A
18   BD    A

It is a poor experimental design, but unfortunately it is the data that I currently must work with.  For Devon's point #2 above: I do want to see "What's the difference due to treating at all?"  To account for my the potential batch effect of the B batch, I am using the design formula:

dds <- DESeqDataSetFromMatrix(countData = countData,colData = colData,design = ~ batch + treatment)
dds$treatment <- factor(dds$treatment, levels=c("control", "BD"))
dds$batch <- factor(dds$batch, levels=c("A", "B"))
dds <- DESeq(dds, full=design(dds), reduced = ~ batch)

The results give me many less DE genes than if I simply ignored the batches and only use ~ treatment.  This makes sense, because according to PCA and clustering, there is a batch effect in my samples. I've read the DESeq2 manual and many posts, but am not a statistician and would love to hear feedback for the design I'm using here.  Thank you!

ADD REPLYlink written 3.6 years ago by gracechappell40
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1139 users visited in the last hour