Question

Multifactor Design For RNA-seq Analysis

1

Entering edit mode

2.8 years ago

David Martínez ▴ 10

Hello everyone,

I'm performing a RNA-seq analysis using DESeq2 and I'm wondering what is the correct analysis approach for my multifactor design. We have searched different posts in different forums and I can't figure out how to approach it.

Here I paste my multifactor design:

enter image description here

Using a ~batch + condition design, we obtained the following error:

some variables in design formula are characters, converting to factorsError in checkFullRank(modelMatrix) : the model matrix is not full rank, so the model cannot be fit as specified. One or more variables or interaction terms in the design formula are linear combinations of the others and must be removed.

We have also checked DESeq2 multiple factors nested design and we are still obtaining the same error. What is the correct factor design for this experiment?

Thank you so much for advance, David.

Rna-seq batch correction DESeq2 multifactor • 1.7k views

ADD COMMENT • link 2.8 years ago by David Martínez ▴ 10

1

Entering edit mode

Is this the full design matrix? Because it looks full rank to me.

ADD REPLY • link 2.8 years ago by i.sudbery 19k

0

Entering edit mode

Can you run model.matrix(~batch + condition, data=design), where design is your design matrix above and tell us if any of the columns are all zero?

and post a sample of the head of the resulting table.

ADD REPLY • link 2.8 years ago by i.sudbery 19k

0

Entering edit mode

Thank you for your time. The complete resulting table is:

enter image description here

ADD REPLY • link 2.8 years ago by David Martínez ▴ 10

0

Entering edit mode

I definately can't reproduce this - if I run deseq2 on the design matrix you posted, it works fine.

ADD REPLY • link 2.8 years ago by i.sudbery 19k

0

Entering edit mode

Did you type down these information by hand or did you read this somehow programatically? Could be some hidden whitespaces somewhere which then would artifically create a new unwanted group. Make a data.frame out of this design, convert it to factor and then check the levels of the columns to be what you want. Must be four levels for condition and three for batch.

ADD REPLY • link 2.8 years ago by ATpoint 81k

0

Entering edit mode

Thank you for your time and your hints. I did a file (named samples.txt) to introduce data and i have checked it and it's ok. What we want is to incorporate to our design all of the aforementioned parameters (condition, batch and exp). We are trying to correct batch effect associated to the library preparation (columns "batch") and experiment (column "exp").

If you need any extra data or our code, please let me know.

Thank you, David.

ADD REPLY • link 2.8 years ago by David Martínez ▴ 10

1

Entering edit mode

Ah! Are you correcting by both batch AND experiment? You can't do that because the batches are composed of the experiments.

ADD REPLY • link 2.8 years ago by i.sudbery 19k

0

Entering edit mode

Thank you for your comments.

ADD REPLY • link 2.8 years ago by David Martínez ▴ 10

0

Entering edit mode

So is exp now included or not? Above in the question you only mention batch and condition.

Please provide the output of dput(cdata) where cdata is the data.frame with the colData information, and please provide the exact command for creating the dds.

ADD REPLY • link 2.8 years ago by ATpoint 81k

0

Entering edit mode

The commands we use are:

#Create a DESeq2 object with DESeqDataSetFromTximport function
ddsHTSeq <- DESeqDataSetFromHTSeqCount(sampleTable = sampleTable,
                                       directory = dir,
                                       design= ~ batch + condition + exp)

#Setting comparison order for DESeq2. If not configured, will be set in alphabetical order. Relevel allow us to specificate the reference level.
ddsHTSeq$condition <- relevel(ddsHTSeq$condition, ref = "2m")

#Change name to ddsHTSeq object.
dds <-ddsHTSeq

#Pre-filtering (Removing rows in which we have less than 1 read in all samples combined)
#dds <- dds[rowSums(DESeq2::counts(dds)) >= 1] 
keep <- rowSums(counts(dds)) >= 1
dds <- dds[keep,]

#Run differential analysis with DESeq2
dds<-DESeq(dds)

And the output of of dput(cdata) (its name is different in my case) is:

structure(list(sampleName = c("Y_F_7", "Y_F_8", "Y_F_6", 
"Y_F_3", "Y_F_2", "Y_F_1", "Y_F_5", "Old_F_6", 
"O_F_5", "O_F_11", "O_F_10", "O_F_9", "O_F_4", 
"O_F_1", "O_F_12", "O_F_13", "O_F_14", "O_F_2", 
"O_F_3", "2m_#09", "2m_#11", "2m_#12", "2m_#10", "2m_#13", 
"28m_#06", "28m_#08", "28m_#09", "28m_#20", "28m_#21", "28m_#22"
), fileName = c("NO_124_htseq.txt", "NO_125_htseq.txt", "NO_126_htseq.txt", 
"NO_127_htseq.txt", "NO_128_htseq.txt", "NO_140_htseq.txt", "NO_141_htseq.txt", 
"NO_142_htseq.txt", "NO_93_htseq.txt", "NO_94_htseq.txt", "NO_95_htseq.txt", 
"SRR001_htseq.txt", "SRR002_htseq.txt", "SRR003_htseq.txt", 
"SRR004_htseq.txt", "SRR005_htseq.txt", "SRR006_htseq.txt", 
"SRR007_htseq.txt", "SRR008_htseq.txt", "SRR009_htseq.txt", 
"SRR010_htseq.txt", "SRR011_htseq.txt", "SRR012_htseq.txt", 
"SRR013_htseq.txt", "SRR014_htseq.txt", "SRR015_htseq.txt", 
"SRR016_htseq.txt", "SRR017_htseq.txt", "SRR018_htseq.txt", 
"SRR019_htseq.txt"), condition = c("3m", "3m", "3m", "3m", 
"3m", "3m", "3m", "29m", "29m", "29m", "29m", "29m", 
"29m", "29m", "29m", "29m", "29m", "29m", "29m", "2m", 
"2m", "2m", "2m", "2m", "28m", "28m", "28m", "28m", "28m", 
"28m"), batch = c("fr_secondstrand", "fr_secondstrand", "fr_firststrand", 
"fr_firststrand", "fr_secondstrand", "fr_secondstrand", "fr_firststrand", 
"fr_firststrand", "fr_firststrand", "fr_firststrand", "fr_firststrand", 
"fr_firststrand", "fr_secondstrand", "fr_firststrand", "fr_secondstrand", 
"fr_secondstrand", "fr_firststrand", "fr_firststrand", "fr_secondstrand", 
"fr_firststrand", "fr_firststrand", "fr_firststrand", "fr_firststrand", 
"fr_firststrand", "Unstranded", "Unstranded", "Unstranded", "fr_firststrand", 
"fr_firststrand", "fr_firststrand"), exp = c("E", "E", "D", "C", 
"C", "C", "D", "D", "D", "E", "E", "E", "D", "C", "E", "E", "E", 
"C", "C", "A", "A", "A", "A", "A", "B", "B", "B", "A", "A", "A"
)), row.names = c(NA, -30L), class = "data.frame")

ADD REPLY • link updated 2.8 years ago by i.sudbery 19k • written 2.8 years ago by David Martínez ▴ 10

score 1 · Answer 1 · 2021-06-14

The problem here is that because experiments all used the same sequencing design ("fr_firststand", "fr_secondstrand" or "unstranded"), it is not possible to disentangle what is a library strategy specific effect, and what is an specific experiment effect. You thus cannot correct for both.

No matter though, its almost certainly not neccessary to correct for both.

If you correct for exp, but not for batch, the exp correction factors will encompass the library design specific effects. Thus the design ~condition + exp will correct for any different in library construction strategy.

However, correcting for exp involves computing 4 coefficients, which will reduce accuracy/power. If the only thing different between experiments is the library construction strategy, you might consider correcting for "batch" and not "exp".