Design matrix to remove technical replicate effect
1
0
Entering edit mode
4 months ago
ARD ▴ 10

I have two groups (Control and Disease). Each group has 4 individuals (A,B,C and D; X,Y,Z and P). Each subject has 3 technical replicates.

Can I use the following model matrix to remove Individual's effect?

model.matrix(~Group + Individual)


where Group is a factor and Individual is a numeric vector?

edgeR • 718 views
0
Entering edit mode

Sample details.

Sample     Group       Individual
----------------------------------------
A1        Control           a
A2        Control           a
A3        Control           a
B1        Control           b
B2        Control           b
B3        Control           b
C1        Control           c
C3        Control           c
...             ...             ...
X1            Disease           x
X2            Disease           x
X3            Disease           x
Y1            Disease           y
Y2            Disease.          y
...           ...              ...

0
Entering edit mode

Any differences/similarities between the following two approaches?

model.matrix(~Group) where Group is a factor (Control and Disease)

contrast(-1,1)

model.matrix(~Subject) where Subject is a factor (a, b, c, d, x, y, z, p)

contrast((x+y+z+p)-(a+b+c+d))

0
Entering edit mode

I'm pretty sure the results here would be identical.

2
Entering edit mode
4 months ago

The short answer is no, you will get a "model matrix not full rank" error.

The long answer is that there are several things wrong the approach you suggest. The first is that Indevidual is not a ordernal variable - that is there is not reason to believe that individual 6 has a larger effect from the "indevidual" variable than indevidual 1, yet using a numeric vector would suggest that this is the case. Thus "Indevidual" should be converted to a factor.

This will still lead to an error, because Control=Indevidual1 + Indevidual2 + Indeivudal3 and Disesae=Indevidual4 + Indevidual5 + Indevidual6. That is, the variables are not linearly independent, and it will be impossible for edgeR to know if variation comes from the Group variable or the Indevidual variable.

There are generally 2 approaches to technical replicates. The most common approach is just to merge them. Each of your samples would then be a bit like an average of the techical replicates. As sums of negative binomial variables are also negative-bionomially distributed, this is fine from a stats point of view.

The alternative is to use the duplicateCorrection function in limma. In this appraoch technical replicate is treated as a random effect variable in a mixed model. Obviously to do this, you'd have to use limma/limma-voom rather than edgeR.

0
Entering edit mode

Thank you Ian. As you mentioned I have the same error with "Individual" as a factor "There is a strict linear dependency in your data. Design matrix not of full rank. The following coefficients not estimable..."

0
Entering edit mode

Yes, he explained to you why that is in above answer. Either combine them or use duplicate correlation. I feel like combining would be more appropriate, but in the end if duplicateCorrelation gives more (or any DEGs) then at least you have something to work with, so whatever floats your boat. The method is around for a long time so it's apparently not terrible.