Question

Error in differential analysis for samples with different time points

1

Entering edit mode

6.2 years ago

Biologist ▴ 290

I have 4 samples. 2 control and 2 gene_oe (over expression) samples.

I wanted to do differential analysis between Gene_OE vs Control samples. I have the samples column data like following:

coldata:

Samples Type    Time
SampleA Control Day1
SampleB Control Day2
SampleD Gene_OE Day1
SampleE Gene_OE Day2

Using edgeR I did like following:

library(edgeR)
group <- factor(paste0(coldata$Type))

And created design matrix like following:

design2 <- model.matrix(~ 0 + group + coldata$Time)
desgin2

    Control Gene_OE day1 day2
1       1        0    0    0
2       1        0    1    0
3       0        1    0    0
4       0        1    1    0

I see some warning message :

y <- estimateDisp(y, design2, robust=TRUE)
Warning message:
In estimateDisp.default(y = y$counts, design = design, group = group,  :
  No residual df: setting dispersion to NA

And an error like below:

fit <- glmQLFit(y, design2, robust=TRUE)
Error in glmFit.default(y, design = design, dispersion = dispersion, offset = offset,  : 
  Design matrix not of full rank.  The following coefficients not estimable:
 day2

What could be the reason for this error? And how to resolve this error?

RNA-Seq r edger differential analysis • 3.7k views

ADD COMMENT • link updated 6.2 years ago by h.mon 35k • written 6.2 years ago by Biologist ▴ 290

score 1 · Answer 1 · 2019-03-14

1

Entering edit mode

6.2 years ago

h.mon 35k

The design matrix is not full rank because you have only one sample (no biological replicates) per combination of treatment (type+time). You may either drop day from the analysis, or add more samples per treatment.

ADD COMMENT • link 6.2 years ago by h.mon 35k

0

Entering edit mode

But If my coldata looks like below, I don't see any error:

coldata:

Samples Type    Time
SampleA Control Day1
SampleB Control Day2
SampleC Control Day3
SampleD Gene_OE Day1
SampleE Gene_OE Day2
SampleF Gene_OE Day3

design2 <- model.matrix(~ 0 + group + coldata$Time)
desgin2

    Control Gene_OE Day2 Day3
1       1        0    0    0
2       1        0    1    0
3       1        0    0    1
4       0        1    0    0
5       0        1    1    0
6       0        1    0    1

Is this right?

ADD REPLY • link 6.2 years ago by Biologist ▴ 290

1

Entering edit mode

The above has the very same problem as your original post. You need more replicates per treatment - each day has only one sample, you need more per same day.

ADD REPLY • link 6.2 years ago by h.mon 35k

0

Entering edit mode

Could you please tell me whether the above way it is right or I should add more samples per treatment?

ADD REPLY • link 6.2 years ago by Biologist ▴ 290