Question: Deseq2 multiple variables!
gravatar for andreiareis1987
6 weeks ago by
andreiareis198730 wrote:

Hi there, I have some doubt in my analysis. I have several variables in rnaseq analysis and i want to get the DEGs.

sample  experiment  typeCell    batch

SD1-1_S6    C   SN1 1

SD2-2_S13   C   SN1 1

SD3-1_S4    C   SN1 1

SD4-1_S17   C   SN1 1

S-SC1-1_S8  C   MN  1

S-SC2-1_S16 C   MN  1

S-SC3_S11   C   MN  1

S-SC4-1_S14 C   MN  1

TD1-2_S18   L   SN1 1

TD2-1_S9    L   SN1 1

TD3-1_S2    L   SN1 1

TD4-1_S1    L   SN1 1

TD6-1_S3    L   SN1 1

T-SC2-1_S5  L   MN  1

T-SC3-1_S12 L   MN  1

T-SC4-1_S10 L   MN  1

T-SC5_S15   L   MN  1

T-SC6_S7    L   MN  1

SCI-C-2-S3  L   PN  2

SCI-C-4-S6  L   PN  2

SCI-C-5-S7  L   PN  2

SCI-C-6-S17 L   PN  2

SCI-C-7-S11 L   PN  2

SCI-DL-1-S8 L   SN2 2

SCI-DL-2-S12    L   SN2 2

SCI-DL-4-S10    L   SN2 2

SCI-DL-6-S4 L   SN2 2

SCI-DR-5-S18    L   SN2 2

SHA-C-4-1-S13   C   PN  2

SHA-C-6-S15 C   PN  2

SHA-C-7-S16 C   PN  2

SHA-C-8-S5  C   PN  2

SHA-DL-1-S14    C   SN2 2

SHA-DL-4-S2 C   SN2 2

SHA-DR-5-S1 C   SN2 2

SHA-DR-8-S9 C   SN2 2

So, if i want to compare only the C (control) in the cells MN and and SN1 i did:


data2<-data2[data2$typeCell %in% c("SN1","MN"),]



dds3<-DESeqDataSetFromMatrix(countData = table2, colData = data2, design= ~typeCell)

dds3<-estimateSizeFactors(dds3, controlGenes=index)


I am asking if this correct filter for the conditions before doing the normalization or i need to filter after the normalization?

Note: I did this approach because doing the filter after the normalization i got an error "Error in checkFullRank(modelMatrix) : ..." i tried to check for redundant columns but i still have the error!

Thanks in advance for your time.

Best Regards, Andreia

deseq2 R • 125 views
ADD COMMENTlink modified 6 weeks ago by swbarnes27.5k • written 6 weeks ago by andreiareis198730

You probably got the error because the batch is redundant since batch 1 and 2 have different cell types (bad experiment design). Try to remove the batch and try again. You can do the normalization either way but if you think that the gene expression is somewhat similar in all cell types you should normalize using all the samples, it will give a better estimate of expression variance

ADD REPLYlink written 6 weeks ago by Asaf7.0k

I removed the column and i have got the same error. :(

ADD REPLYlink written 6 weeks ago by andreiareis198730

Can you add what you tried and how it failed?

ADD REPLYlink written 6 weeks ago by Asaf7.0k
gravatar for swbarnes2
6 weeks ago by
United States
swbarnes27.5k wrote:

If you want to compare two subsets of samples to each other...don't do it like this. I think some of these chopping steps are wrong, and that's why you have an error.

Make a new column that has experiment concatenated with celltype.

Make dds with all the data. If you really have don't want all the samples normalized together, (lots of the time, you do want all the samples normalized together, even the ones you aren't directly comparing) don't do it by chopping up your input files. Make new dds objects, like

dds_keep <- dds[,colnames(dds) %in% keep]


dds_mytissue <-dds[ ,dds$Tissue %in% c('mytissue')]

You might need some dropLevels commands to clean up unused design factors.

I also strongly recommend you not just run the DESeq command like that. Specify the contrasts you want. The idea is for it to be as easy as possible for you to figure out what you did 6 months from now. To compare two subsets to each other, use the concatenated column as the design, and specify what you want with contrasts in the DESeq command.

ADD COMMENTlink modified 6 weeks ago • written 6 weeks ago by swbarnes27.5k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1302 users visited in the last hour