Question: Simultaneously adding two different covariates in the model gives error in DESeq2
gravatar for fizer
2.2 years ago by
fizer20 wrote:


I have rnaseq data generated from brain tissues of 100 individuals, half of them were affected by a disease. All the samples were mix of two different brain regions, obtained from 4 different providers. I am trying to run DESeq2 to identify differentially expressed genes using the model like: ~age + sex + tissue + PMI + condition, where PMI is post-mortem interval. My colData looks like this:

          condition age sex tissue    PMI     RIN    brain_bank
A016-93     disease  79   M FCBA89  48.00 6.400000      LND
A032-95     disease  71   F FCBA89  65.00 6.550000      LND
A046-94     disease  70   F FCBA89 100.00 5.400000      LND
AN00216     disease  63   M FCBA89  26.16 6.150000      KBC
AN02738     disease  85   F FCBA89  12.80 6.200000      KBC
SD001-07    disease  37   F     IF  46.00 6.200000      MII
AN03019     control  47   F FCBA89  31.85 5.300000      KBC
SD004-10    control  50   M     IF  63.00 5.400000      MII
A046-91     control  41   M     IF  49.00 5.450000      LND

Here condition, sex, tissue, brain_bank are as factors and age, RIN, PMI are numeric variables. The RIN values are averaged values because all the samples were sequenced in duplicates.

I want to include brain_bank as a covariate in the model but when I this: ~age + sex + tissue + brain_bank + PMI + condition, it gives me the following error:

the design formula contains a numeric variable with integer values,
 specifying a model with increasing fold change for higher values.
 did you mean for this to be a factor? if so, first convert
 this variable to a factor using the factor() function
Show Traceback

Rerun with Debug
Error in checkFullRank(modelMatrix) : 
 the model matrix is not full rank, so the model cannot be fit as specified.
 One or more variables or interaction terms in the design formula are linear
 combinations of the others and must be removed.

I have seen this error several times but being novice in statistics I am never able to understand this error. If anyone can suggest a good reading material about it, would be great.

Anyways following are my questions:

1) Is there some way to incorporate brain_bank as a covariate in the analysis or I should simply go for the first model?

2) Should I use mean RIN values in the model to account for the effect? Any other suggestions?

3) Not sure if this should asked as a separate question. I also want to run DEXseq in order to find diff. exon usage between cases and controls. So can I use the following model:

~sample + exon + exon:condition + age:exon + sex:exon + tissue:exon + PMI:exon  ## Full formula 
~sample + exon + age:exon + sex:exon + tissue:exon + PMI:exon   ## Reduced formula

Thanks in advance for helping.

rna-seq deseq2 next-gen R dexseq • 1.4k views
ADD COMMENTlink modified 2.2 years ago by Carlo Yague4.4k • written 2.2 years ago by fizer20

Are you sure that age is incorporated as a factor in the model ? I doubt it given this warning message : the design formula contains a numeric variable with integer values.

ADD REPLYlink modified 2.2 years ago • written 2.2 years ago by Carlo Yague4.4k

No my mistake age is added as numeric. I made correction in the post.

ADD REPLYlink written 2.2 years ago by fizer20

Hello fizer!

It appears that your post has been cross-posted to another site:

This is typically not recommended as it runs the risk of annoying people in both communities.

ADD REPLYlink written 2.2 years ago by WouterDeCoster37k

Hi Wouter,

I was not getting any feedback earlier so I thought may be I should go to another site. I was not aware if cross posting could be annoying. Thanks

ADD REPLYlink written 2.2 years ago by fizer20
gravatar for Carlo Yague
2.2 years ago by
Carlo Yague4.4k
Carlo Yague4.4k wrote:

Concerning the "non full rank" issue, I suggest you read the paragraph 3.12 of the DESeq2 manual :

[...] There are two main reasons for this problem: either one or more columns in the model matrix are linear combinations of other columns, or there are levels of factors or combinations of levels of multiple factors which are missing samples [...]

Looking at the snippet table, I couldn't find the redondancy in the factors that can explain the error. You may find it by looking at the full table or by testing models with different combinations of factors with brain-bank.

Concerning your other questions :

  1. A possible workaround is described in the manual.
  2. Looks good to me.
  3. I'm not sure but perhaps someone with more expertize in DEXseq can comment on this.
ADD COMMENTlink written 2.2 years ago by Carlo Yague4.4k

Hi Carlo,

2. Looks good to me.

I did not understand. Are you suggesting I should include averaged RIN score?

Thanks for other suggestions.

ADD REPLYlink written 2.2 years ago by fizer20

Yes, I think that it is a good idea to include the RIN score.

ADD REPLYlink written 2.2 years ago by Carlo Yague4.4k

Although I suspect it's better to create bins (categorical covariate) than keeping exact numbers (continuous covariate).

ADD REPLYlink written 2.2 years ago by WouterDeCoster37k

Can you elaborate ? In theory, a continuous covariate provides more information to the model than arbitrary categories so it should be better. Although it might not be the case if the relationship between the continuous covariate and gene expression is not linear... I'm not sure though.

ADD REPLYlink written 2.2 years ago by Carlo Yague4.4k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 834 users visited in the last hour