How to structure my edgeR model to incorporate a temporal variable and 2 categorical variables?
1
0
Entering edit mode
6 months ago
O.rka ▴ 390

My experiment is the following:

• Temperature 26C vs. Temperature 30C
• Treatment Saline vs. Treatment BMC
• Timepoints 0, 2, and 3.

My intention is to create GLM model to look at differential abundance between groups:

1. 26C vs. 30C for Saline treatment
2. 26C vs. 30C for BMC treatment
3. Saline vs. BMC at 26C
4. Saline vs. BMC at 30C

In doing so, I would like to include all the samples possible when calculating the dispersion which is why I'm using a GLM instead of a Fisher's Exact Test for subsets of samples. I would also like to incorporate the ordered time information.

# Functions
return(df)
}

# Counts
# OG0000000 OG0000001   OG0000002   OG0000003   OG0000004
# T2_10_SALINE_TEMP-PE-D710-D505-1_S10  16909   55  5382    5894    1964
# T2_11_BMC_CONTROL-PE-D711-D505-1_S11  24296   60  2772    3962    1374
# T2_12_BMC_CONTROL-PE-D712-D505-1_S12  24619   60  7351    5389    560
# T2_13_BMC_CONTROL-PE-D701-D506-1_S13  22420   15  2172    2778    930
# T2_14_BMC_CONTROL-PE-D702-D506-1_S14  20049   82  4655    6211    553

#   temperature treatment   collection_time_numeric
# 1_T0_RNA-PE-D711D501-1_S143   26C NaN 0
# 2_T0_RNA-PE-D709D506-1_S134   26C NaN 0
# 4_T0_RNA-PE-D709D505-1_S133   26C NaN 0
# 5_T0_RNA-PE-D709D504-1_S132   26C NaN


If the T0 timepoint is throwing everything off then it can be removed.

I'm trying to figure out how to do this from the following sources: https://www.bioconductor.org/packages/release/bioc/vignettes/edgeR/inst/doc/edgeRUsersGuide.pdf

However, my situation isn't described so please no RTFM responses. If I make a design matrix, it will basically be a binary vector for Treatment_BMC, a binary vector for Treatment_30C, and a numeric vector for the collection time.

If I use this as the design matrix when calculating the dispersion, then how would I for example do #1 above where I calculate the 26C vs. 30C for just the Saline treatment? Does this not make sense to calculate the dispersion for everything?

I could use some guidance a bit here.

edgeR differential-gene-expression glm r • 601 views
0
Entering edit mode
0
Entering edit mode

I tried deleting the other one but couldn't find a delete. Is it possible to delete my other post that you referenced. Apologies.

4
Entering edit mode
6 months ago
Gordon Smyth ★ 3.5k

Your experiment is actually pretty standard. There are 12 treatment groups (2 temperatures x 2 treatments x 3 times) and you can setup the design matrix as explained in Section 3.3 of the edgeR User's Guide. The design matrix will have 1 column for each group.

However NaN values are not allowed in the metadata. You either have to fill in all the treatment values or else remove those samples.

0
Entering edit mode

When I do this, I end up with the error described here (https://stats.stackexchange.com/questions/521917/understanding-why-design-matrix-not-of-full-rank-the-following-coefficients-no?noredirect=1#comment961380_521917). Dropping the T0 makes sense. However, my design matrix isn't full rank when I have [Treatment_Saline, Treatment_BMC, Temperature_26C, Temperature_30C] because of the Temperature_30C column. If I remove this column, then I am able to run the models. However, if I remove the column I don't know how I can look at the effects of say 30C vs. 26C in the Saline treatment. I'm using edgeR.glmLRT(fit, contrast=limma.makeContrasts(f"{treatment_class} - {reference_class}", levels=pandas_to_rpy2(dm) ) ) to do the contrasts.

1
Entering edit mode

No, you haven't followed my advice. If you had, there wouldn't be a Temperature_30C column nor a problem with linear dependence.

You don't have to drop T0 (I wouldn't) but you obviously can't have NaNs.

edgeR is an R package that is part of Bioconductor. You are using a port of edgeR to Python and I cannot help you with Python code. In R, forming the design matrix is no problem and follows a standard process:

> library(edgeR)
> targets$treatment[1:4] <- "None" > Group <- paste(targets$treatment,targets$temperature,targets$collection_time,sep=".")
> Group <- factor(Group)
> design <- model.matrix(~0+Group)
> colnames(design) <- levels(Group)
> table(Group)
Group
BMC.26C.2    BMC.26C.3    BMC.30C.2    BMC.30C.3   None.26C.0
5            5            5            5            4
Saline.26C.2 Saline.26C.3 Saline.30C.2 Saline.30C.3
5            6            5            5
> is.fullrank(design)
[1] TRUE


If you created the design matrix in this standard way then you would have no problems forming the contrasts you wish to make.

> design
BMC.26C.2 BMC.26C.3 BMC.30C.2 BMC.30C.3 None.26C.0 Saline.26C.2 Saline.26C.3 Saline.30C.2 Saline.30C.3
1          0         0         0         0          1            0            0            0            0
2          0         0         0         0          1            0            0            0            0
3          0         0         0         0          1            0            0            0            0
4          0         0         0         0          1            0            0            0            0
5          0         0         0         1          0            0            0            0            0
6          0         0         0         1          0            0            0            0            0
7          0         0         0         0          0            0            0            0            1
8          0         0         0         1          0            0            0            0            0
9          0         0         0         0          0            0            0            0            1
10         0         0         0         1          0            0            0            0            0
11         0         0         0         0          0            0            0            0            1
12         0         0         0         0          0            0            0            0            1
13         0         0         0         0          0            0            1            0            0
14         0         0         0         0          0            0            1            0            0
15         0         1         0         0          0            0            0            0            0
16         0         0         0         0          0            0            1            0            0
17         0         1         0         0          0            0            0            0            0
18         0         1         0         0          0            0            0            0            0
19         0         0         0         0          0            0            1            0            0
20         0         1         0         0          0            0            0            0            0
21         0         0         0         0          0            0            1            0            0
22         0         0         0         0          0            1            0            0            0
23         0         0         0         0          0            1            0            0            0
24         0         0         0         0          0            1            0            0            0
25         0         0         0         0          0            1            0            0            0
26         0         0         0         0          0            1            0            0            0
27         0         0         0         0          0            0            0            1            0
28         0         0         0         0          0            0            0            1            0
29         0         0         0         0          0            0            0            1            0
30         0         0         0         0          0            0            0            1            0
31         0         0         0         0          0            0            0            1            0
32         1         0         0         0          0            0            0            0            0
33         1         0         0         0          0            0            0            0            0
34         1         0         0         0          0            0            0            0            0
35         1         0         0         0          0            0            0            0            0
36         1         0         0         0          0            0            0            0            0
37         0         0         1         0          0            0            0            0            0
38         0         0         1         0          0            0            0            0            0
39         0         0         1         0          0            0            0            0            0
40         0         0         1         0          0            0            0            0            0
41         0         0         1         0          0            0            0            0            0
42         0         0         0         0          0            0            1            0            0
43         0         0         0         0          0            0            0            0            1
44         0         1         0         0          0            0            0            0            0
45         0         0         0         1          0            0            0            0            0

0
Entering edit mode

This makes sense thank you! The design matrix gets expanded out to include the time point info. I'll just make a wrapper to use the R model.matrix function and if that doesn't work I'll just use R entirely.