full rank designe matrix?
1
0
Entering edit mode
6.5 years ago
star ▴ 350

I have a design matrix for my data as below. I run command for analyzing and comparing different groups together but get error.

I would like to have these comparisons: L4vsL6.L8 , Q3vsQ5.Q7, QvsL

design matrix:

> design

organoids_biological_samples   method
L4_D49_rep_1                              L4      L
L4_D49_rep_2                              L4      L
L6_L8_D49_rep_1                        L6_L8      L
L6_L8_D49_rep_2                        L6_L8      L
Q3_D49_rep_1                              Q3      Q
Q3_D49_rep_2                              Q3      Q
Q5_Q7_D49_rep_1                        Q5_Q7      Q
Q5_Q7_D49_rep_2                        Q5_Q7      Q

> design$organoids_biological_samples <- factor(design$organoids_biological_samples, levels = c("L4","L6_L8", "Q3", "Q5_Q7"))
> design$method <- factor(design$method, levels = c("L", "Q"))

> all(rownames(design) %in% colnames(data))

> all(rownames(design) == colnames(data))

> Group <- factor(paste(design$organoids_biological_samples,design$method,sep="."))

> design<- cbind(design,Group)

> design.matrix <- model.matrix(~0+Group+method,design)

> colnames(design.matrix) <- c("L4.L", "L6_L8.L", "Q3.Q", "Q5_Q7.Q", "method")

> design.matrix

L4.L  L6_L8.L  Q3.Q  Q5_Q7.Q  method
L4_D49_rep_1       1       0    0       0      0
L4_D49_rep_2       1       0    0       0      0
L6_L8_D49_rep_1    0       1    0       0      0
L6_L8_D49_rep_2    0       1    0       0      0
Q3_D49_rep_1       0       0    1       0      1
Q3_D49_rep_2       0       0    1       0      1
Q5_Q7_D49_rep_1    0       0    0       1      1
Q5_Q7_D49_rep_2    0       0    0       1      1
attr(,"assign")
[1] 1 1 1 1 2
attr(,"contrasts")
attr(,"contrasts")$Group [1] "contr.treatment" attr(,"contrasts")$method
[1] "contr.treatment"

> edgeR.dgelist = DGEList(counts = data,group = Group)

> edgeR.dgelist = calcNormFactors(edgeR.dgelist,method = "TMM")

> CommonDisp <- estimateGLMCommonDisp(edgeR.dgelist, design.matrix)

Error in glmFit.default(y, design = design, dispersion = dispersion, offset = offset,  :
Design matrix not of full rank.  The following coefficients not estimable:
method

R edgeR bioconductor • 3.2k views
2
Entering edit mode
6.5 years ago
russhh 5.7k

The mathematical stuff:

The fifth column of your design matrix is the sum of the third and fourth columns of your design matrix. So you can take a non-zero linear combination of columns 3, 4, and 5 and get the zero vector (col3 + col4 - col5 = 0). Hence the design matrix is not full rank.

The scientific stuff:

All your L4 and "L6_L8" samples were assessed with method "L", and all your "Q3" and "Q5_Q7" samples were assessed with method "Q". It's therefore not possible to distinguish the effect of method "L" vs method "Q" - because the sample-level variability confounds the method-level variability. You haven't explained what your methods/samples correspond to (so I might have missed an important design detail), but to assess the effects of Q vs L, you'd typically assess all your samples with both methods.