Question: Deseq2 multiple variables!
gravatar for andreiareis1987
9 months ago by
andreiareis198730 wrote:

Hi there, I have some doubt in my analysis. I have several variables in rnaseq analysis and i want to get the DEGs.

sample  experiment  typeCell    batch

SD1-1_S6    C   SN1 1

SD2-2_S13   C   SN1 1

SD3-1_S4    C   SN1 1

SD4-1_S17   C   SN1 1

S-SC1-1_S8  C   MN  1

S-SC2-1_S16 C   MN  1

S-SC3_S11   C   MN  1

S-SC4-1_S14 C   MN  1

TD1-2_S18   L   SN1 1

TD2-1_S9    L   SN1 1

TD3-1_S2    L   SN1 1

TD4-1_S1    L   SN1 1

TD6-1_S3    L   SN1 1

T-SC2-1_S5  L   MN  1

T-SC3-1_S12 L   MN  1

T-SC4-1_S10 L   MN  1

T-SC5_S15   L   MN  1

T-SC6_S7    L   MN  1

SCI-C-2-S3  L   PN  2

SCI-C-4-S6  L   PN  2

SCI-C-5-S7  L   PN  2

SCI-C-6-S17 L   PN  2

SCI-C-7-S11 L   PN  2

SCI-DL-1-S8 L   SN2 2

SCI-DL-2-S12    L   SN2 2

SCI-DL-4-S10    L   SN2 2

SCI-DL-6-S4 L   SN2 2

SCI-DR-5-S18    L   SN2 2

SHA-C-4-1-S13   C   PN  2

SHA-C-6-S15 C   PN  2

SHA-C-7-S16 C   PN  2

SHA-C-8-S5  C   PN  2

SHA-DL-1-S14    C   SN2 2

SHA-DL-4-S2 C   SN2 2

SHA-DR-5-S1 C   SN2 2

SHA-DR-8-S9 C   SN2 2

So, if i want to compare only the C (control) in the cells MN and and SN1 i did:


data2<-data2[data2$typeCell %in% c("SN1","MN"),]



dds3<-DESeqDataSetFromMatrix(countData = table2, colData = data2, design= ~typeCell)

dds3<-estimateSizeFactors(dds3, controlGenes=index)


I am asking if this correct filter for the conditions before doing the normalization or i need to filter after the normalization?

Note: I did this approach because doing the filter after the normalization i got an error "Error in checkFullRank(modelMatrix) : ..." i tried to check for redundant columns but i still have the error!

Thanks in advance for your time.

Best Regards, Andreia

deseq2 R • 400 views
ADD COMMENTlink modified 9 months ago by swbarnes28.9k • written 9 months ago by andreiareis198730

You probably got the error because the batch is redundant since batch 1 and 2 have different cell types (bad experiment design). Try to remove the batch and try again. You can do the normalization either way but if you think that the gene expression is somewhat similar in all cell types you should normalize using all the samples, it will give a better estimate of expression variance

ADD REPLYlink written 9 months ago by Asaf8.4k

I removed the column and i have got the same error. :(

ADD REPLYlink written 9 months ago by andreiareis198730

Can you add what you tried and how it failed?

ADD REPLYlink written 9 months ago by Asaf8.4k
gravatar for swbarnes2
9 months ago by
United States
swbarnes28.9k wrote:

If you want to compare two subsets of samples to each other...don't do it like this. I think some of these chopping steps are wrong, and that's why you have an error.

Make a new column that has experiment concatenated with celltype.

Make dds with all the data. If you really have don't want all the samples normalized together, (lots of the time, you do want all the samples normalized together, even the ones you aren't directly comparing) don't do it by chopping up your input files. Make new dds objects, like

dds_keep <- dds[,colnames(dds) %in% keep]


dds_mytissue <-dds[ ,dds$Tissue %in% c('mytissue')]

You might need some dropLevels commands to clean up unused design factors.

I also strongly recommend you not just run the DESeq command like that. Specify the contrasts you want. The idea is for it to be as easy as possible for you to figure out what you did 6 months from now. To compare two subsets to each other, use the concatenated column as the design, and specify what you want with contrasts in the DESeq command.

ADD COMMENTlink modified 9 months ago • written 9 months ago by swbarnes28.9k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1404 users visited in the last hour