DEseq2 design tutorial with multiple factors experiment
0
0
Entering edit mode
3.0 years ago

Dear all,

First of all, I would like to inform you that I'm new in RNA-seq analysis and the DEseq2 package. Also, I have (very) basic knowledge in statistic, so my apologies if I'm asking naive questions :)

We would like to analyse different cell population that we isolated from different samples/environnement (blood, ascites, tumor) from different patients. RNA-sequencing was done in bulk. Because these data were generated in the context of a collaboration between several research groups, all the cells were not isolated from the same lab. I would like to test this parameter of course.

The idea in my design is the following: because I expect difference between cell type (of course) and conditions (the environnement), I've created a new column in my annotation object, which combine (paste0) the column cell_type and condition. In brief, I will consider "gMDSC from blood" as a different cell population than "gMDSC from ascites".

Here's a exemple of my annotation df

            cell_type          cond origin                 group
CA.gMDSC.Blood       gMDSC  Cancer_Blood      1    gMDSC_Cancer_Blood
DE.gMDSC.Ascites     gMDSC       Ascites      1         gMDSC_Ascites
DE.gMDSC.Blood       gMDSC  Cancer_Blood      1    gMDSC_Cancer_Blood
DO.gMDSC.Blood       gMDSC  Cancer_Blood      1    gMDSC_Cancer_Blood
FR.gMDSC.Ascites     gMDSC       Ascites      1         gMDSC_Ascites
FR.gMDSC.Blood       gMDSC  Cancer_Blood      1    gMDSC_Cancer_Blood
FR.gMDSC.Spleen      gMDSC Cancer_Spleen      1   gMDSC_Cancer_Spleen
KD.gMDSC.Ascites     gMDSC       Ascites      1         gMDSC_Ascites
KD.gMDSC.Blood       gMDSC  Cancer_Blood      1    gMDSC_Cancer_Blood
NO.gMDSC.Ascites     gMDSC       Ascites      1         gMDSC_Ascites
NO.gMDSC.Tumor       gMDSC         Tumor      1           gMDSC_Tumor
ON.gMDSC.Blood       gMDSC  Cancer_Blood      1    gMDSC_Cancer_Blood
ON.gMDSC.Tumor       gMDSC         Tumor      1           gMDSC_Tumor
RE.gMDSC.Blood       gMDSC  Cancer_Blood      1    gMDSC_Cancer_Blood
RE.gMDSC.Tumor       gMDSC         Tumor      1           gMDSC_Tumor
RI.gMDSC.Blood       gMDSC  Cancer_Blood      1    gMDSC_Cancer_Blood
RI.gMDSC.Tumor       gMDSC         Tumor      1           gMDSC_Tumor
SH.gMDSC.Tumor       gMDSC         Tumor      1           gMDSC_Tumor
TI.gMDSC.Ascites     gMDSC       Ascites      1         gMDSC_Ascites
TI.gMDSC.Tumor       gMDSC         Tumor      1           gMDSC_Tumor
A01.gMDSC              gMDSC       Ascites      2         gMDSC_Ascites
A03.gMDSC              gMDSC       Ascites      2         gMDSC_Ascites


. . .

With sample names put as rownames. 1, 2, 3 and 4 are the 4 levels of my "origin" factor, and correspond to the different research group that isolated the cells

The way I understood the Deseq2 design formula, is "you choose the factor you want to use for comparaison in your analysis (the last factor), while puting the factors you want to "control" first. I guess control here mean "taking into account the variability due to this factor while analysing DEG for the factor of interest".

Here was my formula:

 dds <-  DESeqDataSetFromMatrix(countData = cnt,
colData = annot,
design = ~ origin + group)


Unfortunately, I got this error message:

"Error in checkFullRank(modelMatrix) : the model matrix is not full rank, so the model cannot be fit as specified. One or more variables or interaction terms in the design formula are linear combinations of the others and must be removed. Please read the vignette section 'Model matrix not full rank': vignette('DESeq2')"

If I remove the "origin" in my design formula, the script runs fine. But I feel that I miss something quite important there.

So I'm quite lost here...Am I going in the good direction for this kind of analysis (compairing cell population) or am I completely wrong?

Thanks in advance for your help, and sorry if I forgot to put some important information in the thread, but do not hesitate to ask them :)

Chris

R RNA-Seq DEseq2 • 817 views
0
Entering edit mode

Your origin column appears to encode the same info as the group column, doesn't it?

0
Entering edit mode

Sory, ignore that comment, I was confused by the alignment in your data frame

0
Entering edit mode

Is there a level of "group" that all come from a single origin, or a research centre that only provided samples of a single type?

0
Entering edit mode

Hi, thank you for your time

No, for each level of "origin", there are at least two level of "group" :)