Entering edit mode
6.2 years ago
Biologist
▴
290
I have 4 samples. 2 control and 2 gene_oe (over expression) samples.
I wanted to do differential analysis between Gene_OE vs Control samples. I have the samples column data like following:
coldata:
Samples Type Time
SampleA Control Day1
SampleB Control Day2
SampleD Gene_OE Day1
SampleE Gene_OE Day2
Using edgeR
I did like following:
library(edgeR)
group <- factor(paste0(coldata$Type))
And created design matrix like following:
design2 <- model.matrix(~ 0 + group + coldata$Time)
desgin2
Control Gene_OE day1 day2
1 1 0 0 0
2 1 0 1 0
3 0 1 0 0
4 0 1 1 0
I see some warning message :
y <- estimateDisp(y, design2, robust=TRUE)
Warning message:
In estimateDisp.default(y = y$counts, design = design, group = group, :
No residual df: setting dispersion to NA
And an error like below:
fit <- glmQLFit(y, design2, robust=TRUE)
Error in glmFit.default(y, design = design, dispersion = dispersion, offset = offset, :
Design matrix not of full rank. The following coefficients not estimable:
day2
What could be the reason for this error? And how to resolve this error?
But If my coldata looks like below, I don't see any error:
coldata:
Is this right?
The above has the very same problem as your original post. You need more replicates per treatment - each day has only one sample, you need more per same day.
Could you please tell me whether the above way it is right or I should add more samples per treatment?