Question: Problem with design for DESeq2
0
gravatar for 5cantaloupe5
7 weeks ago by
5cantaloupe50 wrote:

I am trying to use the DESeq2 package to compute differentially expressed genes from raw RNA-Seq counts, and I am having trouble setting up the design. I have 57 samples consisting of 2 diseases groups, 19 patients, and 3 tissue types. An example of my design:

Sample  Disease   Patient   Tissue
1       A         1         x
2       A         1         y
3       A         1         z
4       A         2         x
5       A         2         y
6       A         2         z
7       A         3         x
8       A         3         y
9       A         3         z
10      A         4         x
11      A         4         y
12      A         4         z
13      A         5         x
14      A         5         y
15      A         5         z
16      A         6         x
17      A         6         y
18      A         6         z
19      A         7         x
20      A         7         y
21      A         7         z
22      A         8         x
23      A         8         y
24      A         8         z
25      A         9         x
26      A         9         y
27      A         9         z
28      A         10        x
29      A         10        y
30      A         10        z
31      A         11        x
32      A         11        y
33      A         11        z
34      A         12        x
35      A         12        y
36      A         12        z
37      A         13        x
38      A         13        y
39      A         13        z
40      B         14        x
41      B         14        y
42      B         14        z
43      B         15        x
44      B         15        y
45      B         15        z
46      B         16        x
47      B         16        y
48      B         16        z
49      B         17        x
50      B         17        y
51      B         17        z
52      B         18        x
53      B         18        y
54      B         18        z
55      B         19        x
56      B         19        y
57      B         19        z

The comparisons of interest are pairwise comparisons between the interactions of Disease and Tissue (A.x vs A.y, A.x vs A.z, B.x vs. B.y, etc.) taking into account the pairing of the tissues by each patient. I also want to compare similar tissues across diseases (A.x vs B.x, A.y vs B.y) but without the pairing of course. I tried the following design in the DESeq2 workflow:

~Patient + Disease:Tissue

but I received the following error:

Error in checkFullRank(modelMatrix) : 
  the model matrix is not full rank, so the model cannot be fit as specified.
  Levels or combinations of levels without any samples have resulted in
  column(s) of zeros in the model matrix.

So I read the DESeq2 vignette and attempted their solution by creating a new variable for patients nested within groups:

Sample  Disease   Patient   Tissue   Patient.n
1       A         1         x        1
2       A         1         y        1
3       A         1         z        1
4       A         2         x        2
5       A         2         y        2
6       A         2         z        2
7       A         3         x        3
8       A         3         y        3
9       A         3         z        3
10      A         4         x        4
11      A         4         y        4
12      A         4         z        4
13      A         5         x        5
14      A         5         y        5
15      A         5         z        5
16      A         6         x        6
17      A         6         y        6
18      A         6         z        6
19      A         7         x        7
20      A         7         y        7
21      A         7         z        7
22      A         8         x        8
23      A         8         y        8
24      A         8         z        8
25      A         9         x        9
26      A         9         y        9
27      A         9         z        9
28      A         10        x        10
29      A         10        y        10
30      A         10        z        10
31      A         11        x        11
32      A         11        y        11
33      A         11        z        11
34      A         12        x        12
35      A         12        y        12
36      A         12        z        12
37      A         13        x        13
38      A         13        y        13
39      A         13        z        13
40      B         14        x        1
41      B         14        y        1
42      B         14        z        1
43      B         15        x        2
44      B         15        y        2
45      B         15        z        2
46      B         16        x        3
47      B         16        y        3
48      B         16        z        3
49      B         17        x        4
50      B         17        y        4
51      B         17        z        4
52      B         18        x        5
53      B         18        y        5
54      B         18        z        5
55      B         19        x        6
56      B         19        y        6
57      B         19        z        6

with the following suggested design:

~Disease + Disease:Patient.n + Disease:Tissue

But I get the same error as above. Not sure what I'm doing here, any help is appreciated.

design deseq2 • 204 views
ADD COMMENTlink modified 7 weeks ago • written 7 weeks ago by 5cantaloupe50
1

Patient 13 appears in A and B

ADD REPLYlink written 7 weeks ago by Asaf8.5k

Ah that was a typo when transferring the table here, good catch. I have fixed it in my original post. My actual design does not have that mistake

ADD REPLYlink modified 7 weeks ago • written 7 weeks ago by 5cantaloupe50
3
gravatar for i.sudbery
7 weeks ago by
i.sudbery11k
Sheffield, UK
i.sudbery11k wrote:

Because you have different numbers of patients in each Disease group, you are falling victim to the "Levels without samples" problem outlined in this section of the DESeq2 manual.

Because there is no data for patient.n=7 (or 8,9,10,11,12,13) and disease=B, you will generate columns in your design matrix where every entry is 0 for the diseaseB:patient.n7 column (as well as the same for the other patients). Thus you need to edit the design matrix to remove these columns. See the same DESeq2 manual entry linked above.

ADD COMMENTlink written 7 weeks ago by i.sudbery11k

I tried this and it works but I am getting strange results. If I just use a model without taking pairing into account (~Disease:Tissue), the top DEGs have p-values to the order of 10^-40. If I follow the instructions in the vignette, making my own design, dropping levels with all 0s, and assigning it to the "full" argument, the top DEGs have p-values to the order of 10^-10.

ADD REPLYlink written 7 weeks ago by 5cantaloupe50
3

The more complex a model, the less powerful the analysis is. The Disease:Patient.n involves the estimation of 19 more parameters in the model, it will always be an empirical question of whether you lose more power modeling those extra 19 parameters than you gain by accounting for the patient specific effects.

ADD REPLYlink written 7 weeks ago by i.sudbery11k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1736 users visited in the last hour
_