Question: Experimental design RNAseq differential expression
0
gravatar for Pietro
4 weeks ago by
Pietro50
Italy
Pietro50 wrote:

Dear users,

I am struggling to understand if my design is correct. I found edgeR section 3.3.1 similar to my situation but I am not that confident.

Here is my experimental design:

samples_table

sampleId    cellLine    treatment   time        IC
s1                 a      vehicle      0         S
s2                 a         drug     48         S
s3                 a         drug    168         S
s4                 b      vehicle      0         S
s5                 b         drug     48         S
s6                 b         drug    168         S
s7                 c      vehicle      0         S
s8                 c         drug     48         S
s9                 c         drug    168         S
s10                d      vehicle      0         S
s11                d         drug     48         S
s12                d         drug    168         S
s13                e      vehicle      0         R
s14                e         drug     48         R
s15                e         drug    168         R
s16                f      vehicle      0         R
s17                f         drug     48         R
s18                f         drug    168         R

I have 6 cell lines treated with a drug and RNA sequenced after 48 and 168 hours of treatment. Last column indicates if the cell line is susceptible or resistant to another compound.

I would like to find how resistant cell lines differentially respond to the drug at 48 and/or at 168 hours compared to resistant ones.

Here is my approach:

group <- factor(paste(samples_table$IC, samples_table$time, sep="."))
y <- DGEList(counts.keep, group=group)
y <- calcNormFactors(y)
design <- model.matrix(~0 + group)
colnames(design) <- levels(group)
y <- estimateDisp(y, design)

# Quasi-likelihood test
fit <- glmQLFit(y, design)

design

   R.0 R.168 R.48 S.0 S.168 S.48
1    0     0    0   1     0    0
2    0     0    0   0     0    1
3    0     0    0   0     1    0
4    0     0    0   1     0    0
5    0     0    0   0     0    1
6    0     0    0   0     1    0
7    0     0    0   1     0    0
8    0     0    0   0     0    1
9    0     0    0   0     1    0
10   1     0    0   0     0    0
11   0     0    1   0     0    0
12   0     1    0   0     0    0
13   1     0    0   0     0    0
14   0     0    1   0     0    0
15   0     1    0   0     0    0
attr(,"assign")
[1] 1 1 1 1 1 1
attr(,"contrasts")
attr(,"contrasts")$group
[1] "contr.treatment"

my.contrasts <- makeContrasts(
  S48 = S.48 - S.0,
  S168 = S.168 - S.0,
  S168vs48 = S.168 - S.48,
  R48 = R.48 - R.0,
  R168 = R.168 - R.0,
  R168vs48 = R.168 - R.48,
  RvsS.0 = R.0 - S.0,
  RvsS.48 = (R.48 - R.0) - (S.48 - S.0),
  RvsS.168 = (R.168 - R.0) - (S.168 - S.0),
  all = (R.48 + R.168 - R.0) - (S.48 + S.168 - S.0),
  levels = design
)

# to find genes that differentially respond at 48h between resistant and susceptible cell lines

qlf <- glmQLFTest(fit, 
                  contrast = my.contrasts[ , "RvsS.48"])

topTags(qlf)

What do you think? Shouldn't I account the fact that I have different cell lines as a sort of batch effect?

Thanks a lot

Pietro

PS: cross posted to https://support.bioconductor.org/p/119310/

ADD COMMENTlink modified 22 days ago by Biostar ♦♦ 20 • written 4 weeks ago by Pietro50

As you posted on Bioconductor, Aaron or Gordon will likely respond, and their answer will supersede any answer given here. On face value —yes— you should be adjusting for cell-line by, for example, including cellLine in your design formula. So, something like:

~ cellLine + group

Before doing this, I would also just check via a PCA bi-plot to see how cellLine is distributed across your samples

ADD REPLYlink modified 4 weeks ago • written 4 weeks ago by Kevin Blighe41k

Thanks Kevin,

Tried that, but it throws an "Design matrix not of full rank" error.

ADD REPLYlink written 4 weeks ago by Pietro50

You can't include both cellLine and IC, they're mutually exclusive.

ADD REPLYlink written 4 weeks ago by Devon Ryan89k

Hi Devon,

Realized that. So, what do you suggest to account for cellLine in my design?

Thanks

ADD REPLYlink modified 4 weeks ago • written 4 weeks ago by Pietro50

Remove IC and perform any relevant grouping of it as needed during a contrast.

ADD REPLYlink written 4 weeks ago by Devon Ryan89k

You mean

design <- model.matrix(~ 0 + cellLine + time, 
                       levels = samples_table)

?

And then I have

     a   b    c    d   e     f time48 time168
1    1   0    0    0   0     0      0       0
2    1   0    0    0   0     0      1       0
3    1   0    0    0   0     0      0       1
4    0   1    0    0   0     0      0       0
5    0   1    0    0   0     0      1       0
6    0   1    0    0   0     0      0       1
7    0   0    1    0   0     0      0       0
8    0   0    1    0   0     0      1       0
9    0   0    1    0   0     0      0       1
10   0   0    0    1   0     0      0       0
11   0   0    0    1   0     0      1       0
12   0   0    0    1   0     0      0       1
13   0   0    0    0   1     0      0       0
14   0   0    0    0   1     0      1       0
15   0   0    0    0   1     0      0       1
16   0   0    0    0   0     1      0       0
17   0   0    0    0   0     1      1       0
18   0   0    0    0   0     1      0       1

How do I specify that I want time48 only for cell lines e and f against time48 for the other 4 cell lines? Thanks

ADD REPLYlink modified 4 weeks ago • written 4 weeks ago by Pietro50

If you want specific group comparisons like that it's more convenient to make groups like a_time48, a_time168 and use ~0 + group as a model.

ADD REPLYlink written 28 days ago by Devon Ryan89k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1157 users visited in the last hour