Question

Deseq2 multiple variables!

0

Entering edit mode

5.5 years ago

andreiareis1987 ▴ 40

Hi there, I have some doubt in my analysis. I have several variables in rnaseq analysis and i want to get the DEGs.

sample  experiment  typeCell    batch

SD1-1_S6    C   SN1 1

SD2-2_S13   C   SN1 1

SD3-1_S4    C   SN1 1

SD4-1_S17   C   SN1 1

S-SC1-1_S8  C   MN  1

S-SC2-1_S16 C   MN  1

S-SC3_S11   C   MN  1

S-SC4-1_S14 C   MN  1

TD1-2_S18   L   SN1 1

TD2-1_S9    L   SN1 1

TD3-1_S2    L   SN1 1

TD4-1_S1    L   SN1 1

TD6-1_S3    L   SN1 1

T-SC2-1_S5  L   MN  1

T-SC3-1_S12 L   MN  1

T-SC4-1_S10 L   MN  1

T-SC5_S15   L   MN  1

T-SC6_S7    L   MN  1

SCI-C-2-S3  L   PN  2

SCI-C-4-S6  L   PN  2

SCI-C-5-S7  L   PN  2

SCI-C-6-S17 L   PN  2

SCI-C-7-S11 L   PN  2

SCI-DL-1-S8 L   SN2 2

SCI-DL-2-S12    L   SN2 2

SCI-DL-4-S10    L   SN2 2

SCI-DL-6-S4 L   SN2 2

SCI-DR-5-S18    L   SN2 2

SHA-C-4-1-S13   C   PN  2

SHA-C-6-S15 C   PN  2

SHA-C-7-S16 C   PN  2

SHA-C-8-S5  C   PN  2

SHA-DL-1-S14    C   SN2 2

SHA-DL-4-S2 C   SN2 2

SHA-DR-5-S1 C   SN2 2

SHA-DR-8-S9 C   SN2 2

So, if i want to compare only the C (control) in the cells MN and and SN1 i did:

data2<-data[data$experiment=="C",]

data2<-data2[data2$typeCell %in% c("SN1","MN"),]

data2$batch<-NULL

table2<-table[,grep(pattern=".+SHAM\\_MN.+|PNI\\-SHAM\\_SN.+",x=colnames(table))]

dds3<-DESeqDataSetFromMatrix(countData = table2, colData = data2, design= ~typeCell)

dds3<-estimateSizeFactors(dds3, controlGenes=index)

dds3<-DESeq(dds3)

I am asking if this correct filter for the conditions before doing the normalization or i need to filter after the normalization?

Note: I did this approach because doing the filter after the normalization i got an error "Error in checkFullRank(modelMatrix) : ..." i tried to check for redundant columns but i still have the error!

Thanks in advance for your time.

Best Regards, Andreia

deseq2 R • 2.4k views

ADD COMMENT • link updated 5.5 years ago by swbarnes2 15k • written 5.5 years ago by andreiareis1987 ▴ 40

0

Entering edit mode

You probably got the error because the batch is redundant since batch 1 and 2 have different cell types (bad experiment design). Try to remove the batch and try again. You can do the normalization either way but if you think that the gene expression is somewhat similar in all cell types you should normalize using all the samples, it will give a better estimate of expression variance

ADD REPLY • link 5.5 years ago by Asaf 10k

0

Entering edit mode

I removed the column and i have got the same error. :(

ADD REPLY • link 5.5 years ago by andreiareis1987 ▴ 40

0

Entering edit mode

Can you add what you tried and how it failed?

ADD REPLY • link 5.5 years ago by Asaf 10k

score 0 · Answer 1 · 2020-01-16

If you want to compare two subsets of samples to each other...don't do it like this. I think some of these chopping steps are wrong, and that's why you have an error.

Make a new column that has experiment concatenated with celltype.

Make dds with all the data. If you really have don't want all the samples normalized together, (lots of the time, you do want all the samples normalized together, even the ones you aren't directly comparing) don't do it by chopping up your input files. Make new dds objects, like

dds_keep <- dds[,colnames(dds) %in% keep]

or

dds_mytissue <-dds[ ,dds$Tissue %in% c('mytissue')]

You might need some dropLevels commands to clean up unused design factors.

I also strongly recommend you not just run the DESeq command like that. Specify the contrasts you want. The idea is for it to be as easy as possible for you to figure out what you did 6 months from now. To compare two subsets to each other, use the concatenated column as the design, and specify what you want with contrasts in the DESeq command.