Question: DESeq sample organization
gravatar for andremrsantos
6.0 years ago by
andremrsantos0 wrote:

I am studying some transcript diferencial expression between Cancer and Adjacent tissues. My sample is organized as follows:

            sample  type
Barcode_01  01      NT
Barcode_03  02      GC
Barcode_04  02      AD
Barcode_05  03      GC
Barcode_06  03      AD
Barcode_07  04      AD
Barcode_08  04      GC
Barcode_09  05      AD
Barcode_10  05      GC

where GC is gastric tissue and AD the adjacent tissue. I also have one non-cancerous sample that I wish to compare. Thus I need to compare:

AD x GC (where I need to account for in sample variation)



However on loading my data to DESeq, it returns the following error:

> raw <- DESeqDataSetFromMatrix(count,, ~ type + sample)

Erro em DESeqDataSet(se, design = design, ignoreRank) :
  the model matrix is not full rank, so the model cannot be fit as specified.
  one or more variables or interaction terms in the design formula
  are linear combinations of the others and must be removed

Is there some way to organize my data in order to account for in sample variation in my comparison?

tool rna-seq deseq software error • 2.8k views
ADD COMMENTlink modified 6.0 years ago by Michael Love2.1k • written 6.0 years ago by andremrsantos0
gravatar for Michael Love
6.0 years ago by
Michael Love2.1k
United States
Michael Love2.1k wrote:

Devon is right, this analysis is complicated by the fact that sample 1 and NT are confounded, so there's no way to model both effects.

There is a way to hack the column data to fit a model which controls for the sample differences in the GC and AD samples. Make sure that NT is the base level of type (see vignette). Add a column to the column data which is sample.nested = factor(c(1,1,1, 2,2, 3,3, 4,4)). Then use a design of ~ sample.nested + type., and use: DESeq(dds, modelMatrixType="standard"). The AD vs GC results table is straightforward (use 'contrast'), however the ones involving NT are a bit more complicated. If you were to ask for a simple contrast, results(dds, contrast=c("type","AD","NT")), this would only give the comparison within samples 1 and 2. You have to add 1/4 of the effects from the sample.nested terms in resultsNames(dds). So the numeric contrast should be results(dds, contrast=c(0,1/4,1/4,1/4,1,0)) for the AD vs NT comparisons for example.

ADD COMMENTlink modified 6.0 years ago • written 6.0 years ago by Michael Love2.1k

I 've tried to perform this analysis on two steps.

First I loaded the data normal using ~ type design and perform AD x NT compare and GC x NT .

Later I loaded the data excluding the first sample and using the design ~type + sample and compared AD x NT. is this wrong? or I should use your trick.

ADD REPLYlink written 6.0 years ago by andremrsantos0
gravatar for Devon Ryan
6.0 years ago by
Devon Ryan97k
Freiburg, Germany
Devon Ryan97k wrote:

You need to remove sample 1. A model with both it and the NT type can't be fit, since you can't discriminate between the sample 1 effect and the NT effect.

ADD COMMENTlink written 6.0 years ago by Devon Ryan97k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1055 users visited in the last hour