Question: (Closed) DESeq2 model design + contrasts (not full rank)
gravatar for caggtaagtat
4 months ago by
caggtaagtat1.3k wrote:


I have a question to the DESeq2 contrast. I have single end reads from 16 samples with 4 treatments (group) and 4 biological replicates (indi). However, the replicates are not evenly distributed across the used flowcells (flow). Can I still correct for the flowcell batch effect?

My meta table looks like this:

group indi flow 
  T1   I1   A     
  T1   I2   A     
  T1   I3   B     
  T1   I4   B     
  T2   I1   A     
  T2   I2   A    
  T2   I3   B     
  T2   I4   B     
  T3   I1   A     
  T3   I2   A     
  T3   I3   B     
  T3   I4   B     
  T4   I1   A     
  T4   I2   A     
  T4   I3   B     
  T4   I4   B

I followed the instructions for such case from the DESeq2 manual:

ds_txi <- DESeqDataSetFromTximport(txi = txi_salmon,
                                   colData = meta,
                                   design = ~ indi+group)

ds_txi$indi_n <- c("I1","I2","I1","I2","I1","I2","I1","I2","I1","I2","I1","I2","I1","I2","I1","I2")

meta$indi_n <- c("I1","I2","I1","I2","I1","I2","I1","I2","I1","I2","I1","I2","I1","I2","I1","I2")

meta$indi_n <- as.factor(meta$indi_n)
ds_txi$indi_n <- as.factor(ds_txi$indi_n)

ds_txi <- DESeqDataSetFromTximport(txi = txi_salmon,
                                   colData = meta,
                                   design = ~ flow + flow:indi_n + flow:group)

Resulting in following meta table:

group indi flow indi_n
  T1   I1    A    I1
  T1   I2    A    I2
  T1   I3    B    I1
  T1   I4    B    I2
  T2   I1    A    I1
  T2   I2    A    I2
  T2   I3    B    I1
  T2   I4    B    I2
  T3   I1    A    I1
  T3   I2    A    I2
  T3   I3    B    I1
  T3   I4    B    I2
  T4   I1    A    I1
  T4   I2    A    I2
  T4   I3    B    I1
  T4   I4    B    I2

With following contrast, I get the difference between treatment T1 and T2 within Batch A:

dds<- DESeq(ds_txi)
res<- results(dds,contrast=list("flowA.groupT1","flowA.groupT2"), alpha= p_adjust_treshold,  lfcThreshold = L2FC_treshold)

But how can I get the general differences between treatment (group) T1 and T2 with elimination of the batch effect, if thats possible?

Could I maybe just do something like this:

res<- results(dds,contrast=list(c("flowA.groupT1","flowB.groupT1"),c("flowA.groupT2","flowB.groupT2")), alpha= p_adjust_treshold,  lfcThreshold = L2FC_treshold)

PCA plots:


dge rna-seq contrast deseq2 rank • 242 views
ADD COMMENTlink modified 4 months ago by swbarnes28.9k • written 4 months ago by caggtaagtat1.3k

Does flowcell refer to the sequencing run? There is typically no detectable batch effect due to sequencing. Check with PCA first if there are signs of batch effects, if not then do not try to correct for it.

ADD REPLYlink written 4 months ago by ATpoint40k

The samples were sequenced at the same day but within other illumina flowcells. Now that you say it, the potentially batch effect seems to only affect samples of two treatments in the PCA plot. The other two treatments seem to be not affected. I will attach the figure to the question. I think this could speack against a strong batch effect in the data.

ADD REPLYlink written 4 months ago by caggtaagtat1.3k

I've personally never seen batch effects due to sequencing. Especially in RNA-seq with all the possible batch effects to to potential RNase contamination, variable library prep efficiency etc. the sequencing is neglectible imho. I would remove it from the design formula.

In the PCA one rather sees that two samples of the same treatment always tend to group together. Did you process those that are closer together at the same day and the other two at a different day, or is there any systematic difference you can think of which could explain this. That is probably the batch effect I'd try to correct.

For example in this plot from my last RNA-seq I processed two samples per treatment on the same day (so the actual cDNA synthesis and Illumina prep), and this could be corrected by using the day of library prep as batch. Right one is uncorrected, left is corrected.

enter image description here

ADD REPLYlink modified 4 months ago • written 4 months ago by ATpoint40k

Thank you very much for the suggestions. I know now, that all was done the same way, except of the RNA-isolation which was done seperately for samples of the different flowcells. This means, that I still have to correct for this bias, using the flowcell ID, to be absoluty correct.

How would you correct for this flowcell bias?

ADD REPLYlink written 4 months ago by caggtaagtat1.3k

I would check with the wetlab people if you can find evidence that these samples as below indeed form a batch based on library prep day, sorry for the terrible quality:

enter image description here

What is "indi" in the formula? I guess if the first column is the condition you are interested in it will be something like ~ batch + group.

ADD REPLYlink modified 4 months ago • written 4 months ago by ATpoint40k

I asked them exactly that earlier and they isolated the RNA seperately for sampes of batch A and batch B (flow). Library prep etc. was done the same for every sample. Indi is the biological individuum the samples were collected from. So ideally I would love to do desing= flow+indi+group. But this gives the error not full rank

ADD REPLYlink written 4 months ago by caggtaagtat1.3k

Will someone looking at this data six months from now understand that you aren't really correcting for flow cell, but for isolation date?

ADD REPLYlink written 4 months ago by swbarnes28.9k

I will be the only one working with the data, but I will of course document and rename the variable flow. The problem for correcting the bias stays the same

ADD REPLYlink written 4 months ago by caggtaagtat1.3k

Hello caggtaagtat!

We believe that this post does not fit the main topic of this site.

For this reason we have closed your question. This allows us to keep the site focused on the topics that the community can help with.

If you disagree please tell us why in a reply below, we'll be happy to talk about it.


ADD REPLYlink written 4 months ago by ATpoint40k
gravatar for swbarnes2
4 months ago by
United States
swbarnes28.9k wrote:

Unless there was a huge error in the sequencing, being on a different flow cell, or a different lane doesn't cause technical artifacts. You can omit that from your model. You should include RNA isolation batch and library prep batch.

ADD COMMENTlink written 4 months ago by swbarnes28.9k

Unfortuantely, the RNA for the sample of the different flow cells was isolated on different days, so I would definitely like to correct for that bias, if possible.

ADD REPLYlink written 4 months ago by caggtaagtat1.3k
Please log in to add an answer.
The thread is closed. No new answers may be added.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2197 users visited in the last hour