Question

design matrix not of full RANK in paired design

0

Entering edit mode

2.4 years ago

jfertaj ▴ 110

Hello,

I have an experiment with paired samples (normal vs tumour). These are biopsies from patients, one biopsy from normal tissue and another biopsy from tumor tissue. These patients have different response to a treatment (respuesta, recidiva). I would like to find which are the differential expressed genes between respuesta and recidiva but accounting for the paired samples

Also, I would like to compare the DEGs in those patients in respuesta status but depending on the treatment, i.e, which are the genes differentially expressed between treatments in those patients that are responders (patients with respuesta).

I have tried to get the design matrix for the first question (respuesta vs recidiva) but I got an error that the matrix is not full rank. I have followed examples from limma and edgeR but I am missing something.

Also I would like to know how should be the design matrix and the contrasts for the second analyses: differences between treatments. My number of different treatments is three, although in the below example there are only two different.

    head(summarydata, n=12)
        sample tissue     names respuesta tratamiento Proyect                                      files
    1       04      N       04S RESPUESTA      CIR+RT   fase1       salmon_results/totalRNA/04S/quant.sf
    2       04      T       04T RESPUESTA      CIR+RT   fase1       salmon_results/totalRNA/04T/quant.sf
    3       10      N       10S RESPUESTA         CIR   fase1       salmon_results/totalRNA/10S/quant.sf
    4       10      T       10T RESPUESTA         CIR   fase1       salmon_results/totalRNA/10T/quant.sf
    5  11FIDIS      N 11S-FIDIS  RECIDIVA         CIR   fase1 salmon_results/totalRNA/11S-FIDIS/quant.sf
    6  11FIDIS      T 11T-FIDIS  RECIDIVA         CIR   fase1 salmon_results/totalRNA/11T-FIDIS/quant.sf
    7       12      N       12S RESPUESTA         CIR   fase1       salmon_results/totalRNA/12S/quant.sf
    8       12      T       12T RESPUESTA         CIR   fase1       salmon_results/totalRNA/12T/quant.sf
    9  12FIDIS      N 12S-FIDIS RESPUESTA      CIR+RT   fase1 salmon_results/totalRNA/12S-FIDIS/quant.sf
    10 12FIDIS      T 12T-FIDIS RESPUESTA      CIR+RT   fase1 salmon_results/totalRNA/12T-FIDIS/quant.sf
    11 13FIDIS      N 13S-FIDIS  RECIDIVA         CIR   fase1 salmon_results/totalRNA/13S-FIDIS/quant.sf
    12 13FIDIS      T 13T-FIDIS  RECIDIVA         CIR   fase1 salmon_results/totalRNA/13T-FIDIS/quant.sf

And I tried these two design matrices:

y <- makeDGEList(gse) # gse is a summarizedExperiment created using *tximeta*

# design matrix for running a first comparison Respuesta vs Recidiva
Subject <- factor(summarydata$sample)
Treat <- factor(summarydata$respuesta, levels=c("RESPUESTA","RECIDIVA"))
tissue <- factor(summarydata$tissue, levels=c("N", "T"))

# 1st try
design1 <- model.matrix(~Subject+tissue+Treat)
rownames(design1) <- colnames(y)

# 2nd try
design2 <- model.matrix(~Subject+Treat)
rownames(design2) <- colnames(y)

And this is the error I get when using any of the two above mentioned design matrices:

Error in glmFit.default(sely, design, offset = seloffset, dispersion = 0.05,  : 
  Design matrix not of full rank.  The following coefficients not estimable:
 TreatRECIDIVA

Many thanks in advance

limma edgeR • 1.3k views

ADD COMMENT • link 2.4 years ago by jfertaj ▴ 110

score 4 · Accepted Answer · 2021-11-25

4

Entering edit mode

2.4 years ago

Gordon Smyth ★ 7.0k

You have a multi-level experiment, see Section 9.7 of the limma User's Guide or Section 3.5 of the edgeR User's Guide.

ADD COMMENT • link 2.4 years ago by Gordon Smyth ★ 7.0k

0

Entering edit mode

Dear Gordon Smyth, thanks a lot for your help, I have tried to follow the steps for both edgeR and limma, and for limma I get his error, I have followed a voom-limma approach. Maybe I am misunderstanding something very obvious

### STEPS TO RUN VOOM-LIMMA APPROACH

gse <- summarizeToGene(se, countsFromAbundance="lengthScaledTPM")
dge <- DGEList(assays(gse)[["counts"]])

# design for limma
Treat <- factor(paste(summarydata$respuesta,summarydata$tissue,sep="."))
design <- model.matrix(~0+Treat)
colnames(design) <- levels(Treat)

## Apply voom normalization
dge <- calcNormFactors(dge)

# filtering steps
keep <- filterByExpr(dge, design)
dge <- dge[keep,,keep.lib.sizes=FALSE]

# apply voom
v <- voom(dge, design, plot=TRUE)

# apply duplicateCorrelation
corfit <- duplicateCorrelation(v, design, block=summarydata$sample)

# apply voom again (with the block and correlation parameters this time)
v <- voom(dge, design, block = summarydata$sample, correlation = corfit$consensus)

# apply lmFit
fit <- lmFit(v,design,block=summarydata$sample,correlation=corfit$consensus)

cm <- makeContrasts(
                    RecidivavsRespuestaForNormal = RECIDIVA.N-RESPUESTA.N, 
                    RecidivavsRespuestaForTumor = RECIDIVA.T-RESPUESTA.T, 
                    TumorvsNormalForRespuesta = RESPUESTA.T-RESPUESTA.N, 
                    TumorvsNormalForRecidiva = RECIDIVA.T-RECIDIVA.N,
                    levels=design)

fit2 <- contrasts.fit(fit, contrasts)

The error that I got is this: Error in contrasts.fit(fit, contrasts) : anyNA() applied to non-(list or vector) of type 'closure'

In case you need the summarydata file, you can find it here summarydata.txt

In addition, I have also tried the edgeR approach but still got the same error of matrix not of full rank. Here are the steps I am doing

# For we define the experimental factors:
Patient <- factor(summarydata$sample) 
Disease <- factor(summarydata$tissue, levels=c("N","T"))
Treatment <- factor(summarydata$respuesta, levels=c("RESPUESTA", "RECIDIVA"))

# We need to adjust for baseline differences between the patients, 
# so the first step is to initialize the design matrix with patient effects:
design <- model.matrix(~Patient)

# Then we define disease-specific treatment effects and append them to the design matrix:
Respuesta.Tumor <- Disease== "T" & Treatment=="RESPUESTA" 
Respuesta.Normal <- Disease=="N" & Treatment=="RESPUESTA"
design <- cbind(design, Respuesta.Tumor, Respuesta.Normal)

#Filterout low count genes
keep <-filterByExpr(y, group = Treatment)
y <- y[keep,keep.lib.size=FALSE]

y <- estimateDisp(y,design, robust=TRUE)

ADD REPLY • link 2.4 years ago by jfertaj ▴ 110

0

Entering edit mode

In the limma approach, you've simply made a typo. You intended

fit2 <- contrasts.fit(fit, cm)

instead of

fit2 <- contrasts.fit(fit, contrasts)

By the way, you can simplify the code by using

fit <- voomLmFit(dge, design, block=summarydata$sample)
fit2 <- contrasts.fit(fit, cm)

in which case voom and duplicateCorrelation will be run automatically.

In the edgeR approach, you have mixed up the within patient and between patient effects. You need to create two effects, one for each patient group. You can't create separate effects for tumor and normal because tumor and normal are within-patient conditions. You need instead

Tumor.Respuesta <- Disease== "T" & Treatment=="RESPUESTA" 
Tumor.Recidiva <- Disease== "T" & Treatment=="RECIDIVA"

ADD REPLY • link 2.4 years ago by Gordon Smyth ★ 7.0k

0

Entering edit mode

Thanks a lot Gordon Smyth, I didn't try the edgeR approach but with the Limma one it worked!!

ADD REPLY • link 2.4 years ago by jfertaj ▴ 110