DESeq2 - design matrix and interaction terms
1
0
Entering edit mode
12 months ago
pkfsantos • 0

Hi,

We are trying to answer if the genetic effects of brood and queen on bees' workers are similar or distinct. We have four experimental conditions (treatments): workers in the presence of a queen (CQ), workers in the presence of brood (CB), workers in the presence of both the queen and brood (CBQ), and workers without a queen or brood (C - control). There are six replicates for each treatment. This is how the data looks like:

ID  Treatment   Control Queen   Brood   Colony
5B   CB            0       0       1     45
7B   C             1       0       0     47
8B   C             1       0       0     47
9B   CB            0       0       1     47
10B  CB            0       0       1     47
15B  CB            0       0       1     45
16B  CQ            0       1       0     46
18B  C             1       0       0     45
19B  CB            0       0       1     46
22B  CQ            0       1       0     47
24B  C             1       0       0     45
26B  CB            0       0       1     46
27B  C             1       0       0     46
30B  C             1       0       0     46
33B  CBQ           0       1       1     45
37B  CQ            0       1       0     45
38B  CQ            0       1       0     47
39B  CBQ           0       1       1     47
41B  CBQ           0       1       1     45
42B  CBQ           0       1       1     46
43B  CQ            0       1       0     46
45B  CBQ           0       1       1     47
48B  CBQ           0       1       1     46
49B  CQ            0       1       0     45


I first analyzed the data performing a pairwise comparison among the treatments, controlling for the colony identity (using DESeq2). This approach was criticized by the reviewers and the paper was denied for publication. One of the reviewers suggested using the LRT model comparison function of Deseq2 to test the main effects of Queen presence, Brood presence, as well as their interaction.

When I try to use the variables "Control", "Queen" and, "Brood" rather than treatment, I get an error if I include the interaction between Queen and Brood. This is the code I am trying to run:

dds <- DESeqDataSetFromTximport(txi.rsem, colData = doe, design = ~ Colony + Control + Queen + Brood + Queen:Brood)

Error in checkFullRank(modelMatrix) :
the model matrix is not full rank, so the model cannot be fit as specified.
One or more variables or interaction terms in the design formula are linear combinations of the others and must be removed.


I have read about this error in the DESeq2 vignette and also a lot of similar posts. But it is still not clear for me if I can't do it with the kind of data that I have, or if there is another way to design the matrix and actually get the results from this interaction. I mean, not using 0 and 1 to represents queen or brood presence/absence, for example.

Next steps would be:

dds <- DESeq(dds, test = "LRT", reduced = ~ Colony + Control) #to get the effect of Queen:Brood interaction
dds <- DESeq(dds, test = "LRT", reduced = ~ Colony + Control + Brood ) #to get the effect of only the queen
dds <- DESeq(dds, test = "LRT", reduced = ~ Colony + Control + Queen) #to get the effect of only the brood


deseq2 rnaseq interactions LRT • 541 views
3
Entering edit mode
12 months ago

Hi,

I don't think you should use LRT test here, but hopefully my answer will be useful whether you chose to go back to a Wald test on coefficients (DESeq2's default) or stick to LRT test.

What the error tells you is that one of you design column is equivalent to a linear combination of other columns, which is redundant. Actually you just need to remove the "Control" column. Why ? Because when Control = 1, Queen and Brood are always equal to 0. So Control does not provide useful information. This should solve your issue.

0
Entering edit mode

Thank you very much Carlo. It worked when I removed the Control. Would you mind explaining why you don't think I should use LRT test?

3
Entering edit mode

LRT is superior to the Wald test when you want to test multiple terms at once, but I don't think it is the case here. So for me, it would be much more straightforward to test directly the Queen effect, the Brood effect and their interaction with Wald tests. IMHO, coefficients (log2FC) are easier to interpret that way, but maybe this comes to personal preferences at this point.

At the end of the day, if the reviewer wants you to do LRT, you might as well do it. The results should be very similar to the Wald tests anyway.

0
Entering edit mode

How to run LRT on single factor for example if I only have different time points six different time point, when I run the reduced model which one becomes my reference in this case , my conditions are designed as C1,C2,C3,C4,C5 and C6 respectively.

My idea is to test each of the time point against each other for example C1 vs The rest , C2 vs the rest . So would be correct to set a reference level ?