Understanding why `Design matrix not of full rank. The following coefficients not estimable` during GLM
0
0
Entering edit mode
3.0 years ago
O.rka ▴ 710

My experiment is the following:

  • Temperature 26C vs. Temperature 30C
  • Treatment Saline vs. Treatment BMC
  • Timepoints 0, 2, and 3.

My intention is to create GLM model to look at differential abundance between groups:

  1. 26C vs. 30C for Saline treatment
  2. 26C vs. 30C for BMC treatment
  3. Saline vs. BMC at 26C
  4. Saline vs. BMC at 30C

In doing so, I would like to include all the samples possible when calculating the dispersion which is why I'm using a GLM instead of a Fisher's Exact Test for subsets of samples. I would also like to incorporate the ordered time information.

I'm trying to understand why my GLM model is failing. I tried using all of my data including the baseline T0 which was an issue with my design matrix because it was not of full rank. I understand why the baseline was an issue as it didn't add any information for any of the other categories. However, I'm not sure why this occurs for Temperature_30C.

Here's my data, code, and experiment design. I'm using Python as my main environment with an R package called edgeR in the backend using an interface called rpy2 if this helps having this info.

from rpy2 import robjects as ro
from rpy2 import rinterface as ri
from rpy2.robjects.packages import importr
from rpy2.robjects import pandas2ri
R = ro.r
NULL = ri.NULL

random_state=0
ro.r('set.seed')(random_state)

# Package
edgeR = importr("edgeR")
limma = importr("limma")


def pandas_to_rpy2(df, rpy2_version_major=None):
    if rpy2_version_major is None:
        from rpy2 import __version__ as rpy2_version
        rpy2_version_major = int(rpy2_version.split(".")[0])
    if rpy2_version_major == 2: # v2.x
        return pandas2ri.py2ri(df)
    if rpy2_version_major == 3: # v3.x
        from rpy2.robjects.conversion import localconverter

        with localconverter(ro.default_converter + pandas2ri.converter):
            return ro.conversion.py2rpy(df)

def build_glm_model(X, design_matrix, random_state=0):

    components = X.columns

    # Run edgeR
    r_X = pandas_to_rpy2(X.T)
    r_components = ro.vectors.StrVector(components)
    r_design = pandas_to_rpy2(design_matrix)

    # Give DGEList the gene expression table with groups
    model = edgeR.DGEList(
        counts= r_X, 
        genes = r_components, 
#         group = r_y,
    )

    # Normalize the expression values
#     if kws["calculate_normalization_factors"]:
    model = edgeR.calcNormFactors(model)

    # Calculate dispersion
    model = edgeR.estimateGLMCommonDisp(model, r_design)

#     if kws["estimate_glm_trended_dispersion"]:
    model = edgeR.estimateGLMTrendedDisp(model, r_design)

    # Estimate tagwise dispersion
#     if kws["estimate_glm_tagwise_dispersion"]:
    model = edgeR.estimateGLMTagwiseDisp(model, r_design)

    # Fit
    fit = edgeR.glmFit(model, r_design)

    return (model, fit)

# Counts
X = pd.read_csv("https://pastebin.com/raw/J7kmL8Ly", sep="\t", index_col=0)

# Metadata
df_metadata = pd.read_csv("https://pastebin.com/raw/PANaC3r5", sep="\t", index_col=0)
design_matrix = pd.concat([
    pd.get_dummies(df_metadata["treatment"].fillna("Baseline"), prefix="Treatment"),
    pd.get_dummies(df_metadata["temperature"], prefix="Temperature"),
    df_metadata["collection_time_numeric"].to_frame("t"),
], axis=1).astype(int)

# Remove the baseline samples
design_matrix = design_matrix.drop("Treatment_Baseline", axis=1).query("t > 0")

# Create the GLM model 
model, fit = build_glm_model(X.loc[design_matrix.index], design_matrix)
# RRuntimeError: Error in glmFit.default(y, design = design, dispersion = dispersion, offset = offset,  : 
#   Design matrix not of full rank.  The following coefficients not estimable:
#  Temperature_30C
edgeR design-matrix differential glm offtopic • 1.5k views
ADD COMMENT
1
Entering edit mode

Also posted on Cross Validated https://stats.stackexchange.com/questions/521917/

ADD REPLY

Login before adding your answer.

Traffic: 3803 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6