Question: including batch in design~
0
gravatar for Morris_Chair
4 weeks ago by
Morris_Chair120
Morris_Chair120 wrote:

Hello, I want to add the batch option in DESeqDataSet but I get error

dds=DESeqDataSetFromTximport(txi,colData = samples,design=~ batch + condition)

Error in DESeqDataSet(se, design = design, ignoreRank) : all variables in design formula must be columns in colData

I don't know what should I add in the coldData (in my table) in order to have the batch argument working ..

if I remove batch from the command line it's all good

dds=DESeqDataSetFromTximport(txi,colData = samples,design=~ batch + condition)

using counts and average transcript lengths from tximport

Any help will be appreciated

thanks

rna-seq deseq2 • 154 views
ADD COMMENTlink modified 4 weeks ago • written 4 weeks ago by Morris_Chair120
2

batch should a=be a column in the samples data.frame you provided

ADD REPLYlink written 4 weeks ago by Asaf6.1k

Hi Asaf,

what that column named batch should contain?

Thank you

ADD REPLYlink written 4 weeks ago by Morris_Chair120

The name of the batch (like sequencing lane ID for instance) each library was prepared and sequenced in

ADD REPLYlink written 4 weeks ago by Asaf6.1k

Hi Asaf, I followed your suggestion but another error came out

Error in checkFullRank(modelMatrix) : 
  the model matrix is not full rank, so the model cannot be fit as specified.
  One or more variables or interaction terms in the design formula are linear
  combinations of the others and must be removed.

  Please read the vignette section 'Model matrix not full rank':

  vignette('DESeq2')

here is my sample data.frame

run sample_name condition         batch
T-ALL   sample1     treated        SRR1
T-ALL   sample2     treated        SRR2
T-ALL   sample3     treated        SRR3
T-ALL   sample4     treated        SRR4
T-ALL   sample5     treated        SRR5
T-ALL   sample6     treated        SRR6
T-ALL   sample7     treated        SRR7
T-ALL   sample8     treated        SRR8
T-ALL   sample9     treated        SRR9
T-ALL   sample10    treated       SRR10
T-ALL   sample11    treated        SRR11
T-ALL   sample12    treated       SRR12
T-ALL   sample13    untreated         SRR13
T-ALL   sample14    untreated         SRR14
T-ALL   sample15    untreated        SRR15
T-ALL   sample16    untreated        SRR16

Here is my design

dds=DESeqDataSetFromTximport(txi,colData = samples, design=~ batch+condition)

from the vignette

thank you

ADD REPLYlink modified 4 weeks ago • written 4 weeks ago by Morris_Chair120

the meaning of batch is a group of libraries that might be influenced by a confounding effect like technician or sequencing. Of course each library is different but that's exactly what you are testing. Unless you have prior knowledge about groups of libraries that might have a confounding effect you don't need to batch correct.

ADD REPLYlink written 4 weeks ago by Asaf6.1k

I want to compare how much the PCA plots or heatmap change when I subtract the batch effect, do you have any idea why is not working in my code? I read in the vignette that possibilities to have the error message like in my case are two but to my understanding none of them fit the situation above

thank you

ADD REPLYlink modified 4 weeks ago • written 4 weeks ago by Morris_Chair120

What batches do you have in your data? In the table you didn't introduce a batch effect, batch effect should group several libraries, not be library specific.

ADD REPLYlink written 4 weeks ago by Asaf6.1k

Hi Asaf, After few days of trying I have to ask again because I can't still figure it out. I give you a summary of the situation, it's a bit different compared to the one above but I hope we can solve it

Here is my coldata

sample_id   condition   batch
sample1     white.          1
sample2     white.          1
sample3     green.          1
sample4     green.          1
sample5     purple.         1
sample6     purple.         1
sample7      red.           1
sample8      red.           1

I give the name to each file sample names(files) <- paste0((colData$sample_id),1:8)

all(file.exists(files))
TRUE

tximport step....

and here is the code dds with the design formula,

dds <- DESeqDataSetFromTximport(txi.salmon,colData=colData, design= ~ batch+condition)

I have two errors: the design formula contains a numeric variable with integer values, specifying a model with increasing fold change for higher values. did you mean for this to be a factor? if so, first convert this variable to a factor using the factor() function Error in checkFullRank(modelMatrix) : the model matrix is not full rank, so the model cannot be fit as specified. One or more variables or interaction terms in the design formula are linear combinations of the others and must be removed.

can you help me to fix it ? what is the way to introduce the batch in the formula,

I can fix those error but using letter instead of number like I, but then I have another problem

Error in DESeqDataSet(se, design = design, ignoreRank) : 
  design contains one or more variables with all samples having the same value,
  remove these variables from the design

it's something wrong with this batch argument because if I take it out it works ok ..

thank you

ADD REPLYlink modified 28 days ago • written 28 days ago by Morris_Chair120
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 825 users visited in the last hour