Question

DESeq sample organization

0

Entering edit mode

9.4 years ago

andremrsantos • 0

I am studying some transcript differential expression between Cancer and Adjacent tissues. My sample is organized as follows:

            sample  type
Barcode_01  01      NT
Barcode_03  02      GC
Barcode_04  02      AD
Barcode_05  03      GC
Barcode_06  03      AD
Barcode_07  04      AD
Barcode_08  04      GC
Barcode_09  05      AD
Barcode_10  05      GC

where GC is gastric tissue and AD the adjacent tissue. I also have one non-cancerous sample that I wish to compare. Thus I need to compare:

AD x GC (where I need to account for in sample variation)

NT x AD

NT x GC

However on loading my data to DESeq, it returns the following error:

raw <- DESeqDataSetFromMatrix(count, sample.data, ~ type + sample)

Erro em DESeqDataSet(se, design = design, ignoreRank) :
  the model matrix is not full rank, so the model cannot be fit as specified.
  one or more variables or interaction terms in the design formula
  are linear combinations of the others and must be removed

Is there some way to organize my data in order to account for in sample variation in my comparison?

software-error DESeq RNA-Seq • 3.5k views

ADD COMMENT • link updated 9 months ago by Ram 43k • written 9.4 years ago by andremrsantos • 0

Ram · Answer 1 · 2014-11-24

Devon is right, this analysis is complicated by the fact that sample 1 and NT are confounded, so there's no way to model both effects.

There is a way to hack the column data to fit a model which controls for the sample differences in the GC and AD samples. Make sure that NT is the base level of type (see vignette). Add a column to the column data which is sample.nested = factor(c(1,1,1, 2,2, 3,3, 4,4)). Then use a design of ~ sample.nested + type., and use: DESeq(dds, modelMatrixType="standard"). The AD vs GC results table is straightforward (use 'contrast'), however the ones involving NT are a bit more complicated. If you were to ask for a simple contrast, results(dds, contrast=c("type","AD","NT")), this would only give the comparison within samples 1 and 2. You have to add 1/4 of the effects from the sample.nested terms in resultsNames(dds). So the numeric contrast should be results(dds, contrast=c(0,1/4,1/4,1/4,1,0)) for the AD vs NT comparisons for example.

score 2 · Answer 2 · 2014-11-22

2

Entering edit mode

9.4 years ago

Devon Ryan 104k

You need to remove sample 1. A model with both it and the NT type can't be fit, since you can't discriminate between the sample 1 effect and the NT effect.

ADD COMMENT • link 9.4 years ago by Devon Ryan 104k