Problem with design for DESeq2
2
0
Entering edit mode
3.3 years ago

I am trying to use the DESeq2 package to compute differentially expressed genes from raw RNA-Seq counts, and I am having trouble setting up the design. I have 57 samples consisting of 2 diseases groups, 19 patients, and 3 tissue types. An example of my design:

Sample  Disease   Patient   Tissue
1       A         1         x
2       A         1         y
3       A         1         z
4       A         2         x
5       A         2         y
6       A         2         z
7       A         3         x
8       A         3         y
9       A         3         z
10      A         4         x
11      A         4         y
12      A         4         z
13      A         5         x
14      A         5         y
15      A         5         z
16      A         6         x
17      A         6         y
18      A         6         z
19      A         7         x
20      A         7         y
21      A         7         z
22      A         8         x
23      A         8         y
24      A         8         z
25      A         9         x
26      A         9         y
27      A         9         z
28      A         10        x
29      A         10        y
30      A         10        z
31      A         11        x
32      A         11        y
33      A         11        z
34      A         12        x
35      A         12        y
36      A         12        z
37      A         13        x
38      A         13        y
39      A         13        z
40      B         14        x
41      B         14        y
42      B         14        z
43      B         15        x
44      B         15        y
45      B         15        z
46      B         16        x
47      B         16        y
48      B         16        z
49      B         17        x
50      B         17        y
51      B         17        z
52      B         18        x
53      B         18        y
54      B         18        z
55      B         19        x
56      B         19        y
57      B         19        z

The comparisons of interest are pairwise comparisons between the interactions of Disease and Tissue (A.x vs A.y, A.x vs A.z, B.x vs. B.y, etc.) taking into account the pairing of the tissues by each patient. I also want to compare similar tissues across diseases (A.x vs B.x, A.y vs B.y) but without the pairing of course. I tried the following design in the DESeq2 workflow:

~Patient + Disease:Tissue

but I received the following error:

Error in checkFullRank(modelMatrix) : 
  the model matrix is not full rank, so the model cannot be fit as specified.
  Levels or combinations of levels without any samples have resulted in
  column(s) of zeros in the model matrix.

So I read the DESeq2 vignette and attempted their solution by creating a new variable for patients nested within groups:

Sample  Disease   Patient   Tissue   Patient.n
1       A         1         x        1
2       A         1         y        1
3       A         1         z        1
4       A         2         x        2
5       A         2         y        2
6       A         2         z        2
7       A         3         x        3
8       A         3         y        3
9       A         3         z        3
10      A         4         x        4
11      A         4         y        4
12      A         4         z        4
13      A         5         x        5
14      A         5         y        5
15      A         5         z        5
16      A         6         x        6
17      A         6         y        6
18      A         6         z        6
19      A         7         x        7
20      A         7         y        7
21      A         7         z        7
22      A         8         x        8
23      A         8         y        8
24      A         8         z        8
25      A         9         x        9
26      A         9         y        9
27      A         9         z        9
28      A         10        x        10
29      A         10        y        10
30      A         10        z        10
31      A         11        x        11
32      A         11        y        11
33      A         11        z        11
34      A         12        x        12
35      A         12        y        12
36      A         12        z        12
37      A         13        x        13
38      A         13        y        13
39      A         13        z        13
40      B         14        x        1
41      B         14        y        1
42      B         14        z        1
43      B         15        x        2
44      B         15        y        2
45      B         15        z        2
46      B         16        x        3
47      B         16        y        3
48      B         16        z        3
49      B         17        x        4
50      B         17        y        4
51      B         17        z        4
52      B         18        x        5
53      B         18        y        5
54      B         18        z        5
55      B         19        x        6
56      B         19        y        6
57      B         19        z        6

with the following suggested design:

~Disease + Disease:Patient.n + Disease:Tissue

But I get the same error as above. Not sure what I'm doing here, any help is appreciated.

DESeq2 design • 1.0k views
ADD COMMENT
1
Entering edit mode

Patient 13 appears in A and B

ADD REPLY
0
Entering edit mode

Ah that was a typo when transferring the table here, good catch. I have fixed it in my original post. My actual design does not have that mistake

ADD REPLY
3
Entering edit mode
3.3 years ago

Because you have different numbers of patients in each Disease group, you are falling victim to the "Levels without samples" problem outlined in this section of the DESeq2 manual.

Because there is no data for patient.n=7 (or 8,9,10,11,12,13) and disease=B, you will generate columns in your design matrix where every entry is 0 for the diseaseB:patient.n7 column (as well as the same for the other patients). Thus you need to edit the design matrix to remove these columns. See the same DESeq2 manual entry linked above.

ADD COMMENT
0
Entering edit mode

I tried this and it works but I am getting strange results. If I just use a model without taking pairing into account (~Disease:Tissue), the top DEGs have p-values to the order of 10^-40. If I follow the instructions in the vignette, making my own design, dropping levels with all 0s, and assigning it to the "full" argument, the top DEGs have p-values to the order of 10^-10.

ADD REPLY
3
Entering edit mode

The more complex a model, the less powerful the analysis is. The Disease:Patient.n involves the estimation of 19 more parameters in the model, it will always be an empirical question of whether you lose more power modeling those extra 19 parameters than you gain by accounting for the patient specific effects.

ADD REPLY

Login before adding your answer.

Traffic: 1957 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6