Question

finding differentially expressed genes in three arm treatment RNA seq data

0

Entering edit mode

9.9 years ago

HNK ▴ 150

hello

I am interested in finding the differentially expressed genes using RNA seq data. I have the tag count file (expression file). I have used edgeR and DESeq before. the problem right now i have is that i have treatments (A, b and C) and no controls and interested to find genes that are DE between any of the groups.

What i did is using edgeR, i compared C to the average of A and B, A to the avg. of B and C and so on. And i get 3 separate list of DEG. I am not sure what i am doing is right or there is some other way to do it?

RNA-Seq DEG 3 arm treatment • 6.1k views

ADD COMMENT • link updated 9.9 years ago by dariober 14k • written 9.9 years ago by HNK ▴ 150

0

Entering edit mode

What's right or wrong will depend on the question you want to answer. So base your design on that.

ADD REPLY • link 9.9 years ago by Devon Ryan 104k

0

Entering edit mode

Thanku soo much martombo and devon... I actually want to find the survival genes among all these 3 treatments.

Ok I will be using Anova like test from edgeR, but do I have to consider intercept or not.

design=model.matrix(~0+group, data=y$samples)

or

design=model.matrix(~group, data=y$samples)

Which option should I use, for me I think, what devon said, as it represents the experimental design well.

lrt <- glmLRT(fit, coef=2:3)

What does coef=2:3 means?

ADD REPLY • link updated 4.3 years ago by Ram 43k • written 9.9 years ago by HNK ▴ 150

0

Entering edit mode

Firstly, it depends on what one means by "survival genes". If I were to write that, I would mean that I have patients with some disorder who had undergone a variety of different treatments, of variable effectiveness, and that I'm looking for changes that correlate with favorable prognosis within/between treatments. We only know that you have 3, presumably non-overlapping, groups so it's difficult to give too much insight there.

What I suspect you'll actually want to do is make 3 pairwise comparisons between the various groups. The simplest way to do that is design=model.matrix(~0+group, data=y$samples) and then use some contrasts for each comparisons. There's very likely an example of this in the edgeR user guide. Then you'll have a list of AvsB, AvsC and BvsC differences, which is probably most useful.

The coef=2:3 bit makes sense if you allow for an intercept. Basically it's looking for difference between calling all the samples groupA and giving them their proper assignments. This is effectively how a 1-way ANOVA works, though it's debateable how biologically relevant the results would be (perhaps it's very informative, perhaps not, it depends on the exact nature of the study). I should note that if you do allow for an intercept, then coef=2 would be the genes DE in groupB vs groupA and coef=3 would be those DE in groupC vs. A. The results this way should be identical to using contrasts, so at the end of the day either will work.

BTW, it's useful to know what an intercept is doing in a model matrix. Basically, it's a group that applies to all of your sample and against which all of your other groups are compared. In your case, this would be groupA. An intercept nicely matches experiments with factorial designs such as an experiment where WT and Mutant animals are treated with a drug or control. That's a classical 2x2 factorial design, where the intercept would be WT animals treated with control, since you'd be interested in changes relative to that. When you have multiple groups that you want to directly compare, then it's usually less confusing to just omit the intercept, since there's no single group against which things are being compared. You can usually doing things with/without an intercept and get the same results, but you'll be best served by not departing too much from the actual experimental design (doing so is a quick way to get confused about what the results actually mean).

ADD REPLY • link updated 4.3 years ago by Ram 43k • written 9.9 years ago by Devon Ryan 104k

0

Entering edit mode

> design=model.matrix(~0+group, data=y$samples) 
> head(design)
                A    B    C
S_1             1    0    0
S_100           1    0    0
S_102           1    0    0
S_104           1    0    0
S_105           1    0    0
S_106           1    0    0

> fit <- glmFit(y, design)
> lrt <- glmLRT(fit, coef=2:3)

I used these command.. Why am I getting the p value and FDR all zero??????

ADD REPLY • link updated 4.3 years ago by Ram 43k • written 9.9 years ago by HNK ▴ 150

0

Entering edit mode

Your samples are only in one group! It looks like there's a problem with y$samples.

ADD REPLY • link 9.9 years ago by Devon Ryan 104k

0

Entering edit mode

Oh, actually, as I mentioned in my comment above, the coef=2:3 part won't make sense without an intercept. That's likely the cause of the wonky output (I'm guessing that there are samples in each of the groups, just that they're not shown).

ADD REPLY • link 9.9 years ago by Devon Ryan 104k

0

Entering edit mode

oh No actually I do have samples in all the groups. It's just that I sorted them, that's why the head(design) just showing samples in one group).

Ok now I did include the intercept and used coef 2:3

> head(design)
        (Intercept)         B         C
S_1               1         0         0
S_100             1         0         0
S_102             1         0         0
S_104             1         0         0
S_105             1         0         0
S_106T1           1         0         0

ADD REPLY • link updated 4.3 years ago by Ram 43k • written 9.9 years ago by HNK ▴ 150

0

Entering edit mode

> topTags(lrt)
         logFC.B      logFC.C    logCPM       LR   PValue FDR
OR4F5      -26.78231  -26.02834 -2.833065 3203.629      0   0
MIR429     -26.78231  -26.07395 -2.799337 3169.653      0   0
MIR4252    -26.78231  -26.38842 -2.842051 3208.550      0   0
PRAMEF13   -26.78231  -26.35145 -2.841839 3208.086      0   0
GUCA2A     -26.78231  -25.41435 -2.741271 3125.508      0   0
DMRTB1     -26.78231  -26.04102 -2.833144 3169.287      0   0
NBPF6      -26.78231  -24.09955 -2.692791 3181.497      0   0
KCNA10     -26.78231  -25.27517 -2.774327 3123.293      0   0
LOC643441  -26.78231  -26.07788 -2.799318 3204.186      0   0
MIR320B1   -26.78231  -26.35638 -2.824551 3208.138      0   0

ADD REPLY • link updated 4.3 years ago by Ram 43k • written 9.9 years ago by HNK ▴ 150

Ram · Answer 1 · 2014-06-04

What about this approach based on edgeRUserGuide 3.3.1? It can be easily expanded to more complicated designs...

targets<- data.frame(condition= rep(c('A', 'B', 'C'), each= 2), 
    row.names= c('sampleA1', 'sampleA2', 'sampleB1', 
    'sampleB2', 'sampleC1', 'sampleC2'))

# > targets
#         condition
# sampleA1         A
# sampleA2         A
# sampleB1         B
# sampleB2         B
# sampleC1         C
# sampleC2         C

matrixOfRawCounts<- matrix(rpois(n= 600, 100), ncol= 6)
Group<- as.factor(rep(c('A', 'B', 'C'), each= 2))

design <- model.matrix(~0+Group)
colnames(design) <- levels(Group)
row.names(design)<- row.names(targets)

#> design
#         A B C
# sampleA1 1 0 0
# sampleA2 1 0 0
# sampleB1 0 1 0
# sampleB2 0 1 0
# sampleC1 0 0 1
# sampleC2 0 0 1
# attr(,"assign")
# [1] 1 1 1
# attr(,"contrasts")
# attr(,"contrasts")$Group
# [1] "contr.treatment"


## Now decide your contrasts of interest
contrasts<- makeContrasts(
    A_vs_B= A - B,
    A_vs_C= A - C,
    A_vs_BC= A - (B+C)/2,
    # etc...
    levels= design
)
#       Contrasts
# Levels A_vs_B A_vs_C A_vs_BC
#      A      1      1     1.0
#      B     -1      0    -0.5
#      C      0     -1    -0.5

## Fit model
d <- DGEList(counts = matrixOfRawCounts, group = Group)
d <- calcNormFactors(d)
d<- estimateGLMCommonDisp(d, design)
d<- estimateGLMTagwiseDisp(d, design)
fit<- glmFit(d, design)

## Get DEGs
lrt<- glmLRT(fit, contrast= contrasts[, "A_vs_B"])
detable<- topTags(lrt)

lrt<- glmLRT(fit, contrast= contrasts[, "A_vs_C"])
detable<- topTags(lrt)

lrt<- glmLRT(fit, contrast= contrasts[, "A_vs_BC"])
detable<- topTags(lrt)

Ram · Answer 2 · 2014-06-02

2

Entering edit mode

9.9 years ago

Martombo ★ 3.1k

I would suggest you not to perform an analysis with the average of two conditions. If you do so you're losing some information (for example you're not taking into account how similar is the expression in these two conditions). it would be better to group B and C together as one condition and test it against A.
anyway since you're using edgeR, you might want to try the approach described in edgeR manual under "An ANOVA-like test for any diﬀerences". That is the proper test to find a difference between "any" group as you wrote.
baySeq could also be useful to you. with that program you can define custom models for your data and test them at once.

ADD COMMENT • link 9.9 years ago by Martombo ★ 3.1k

0

Entering edit mode

the command in edgeR for the ANOVA-like test is the following:

lrt <- glmLRT(fit, coef=2:3)

ADD REPLY • link updated 4.3 years ago by Ram 43k • written 9.9 years ago by Martombo ★ 3.1k

0

Entering edit mode

That only makes sense if there's an intercept, which there isn't.

ADD REPLY • link 9.9 years ago by Devon Ryan 104k

0

Entering edit mode

the intercept term could be provided by one of the conditions right? like in the following example:

> design

        (Intercept)  groupB  groupC
sample.1          1       0       0
sample.2          1       1       0
sample.3          1       0       1

I think that's a good way to test for "any" difference between the group conditions. but of course it would be useful to have a more detailed explanation on what are the biological questions to be answered.

ADD REPLY • link updated 4.3 years ago by Ram 43k • written 9.9 years ago by Martombo ★ 3.1k

2

Entering edit mode

Yes and no. Yes that makes it convenient to test for pair-wise or combined effects, but "no" in that one should be very careful in recommending that to someone unfamiliar with model matrices since it doesn't actually represent the experimental design, which is something like:

         GroupA GroupB GroupC
Sample1       1      0      0
Sample2       0      1      0
Sample3       0      0      1

It's likely a better idea to just suggest using contrasts, which will allow HNK to keep the design that matches the experiment, instead having the software rearrange the matrix for the actual comparisons (both edgeR and DESeq2 will do that for the end user). I only see this as better in that it's less likely to be misunderstood by the uninitiated (otherwise, yeah, your solution would seem quite proper).

ADD REPLY • link updated 4.3 years ago by Ram 43k • written 9.9 years ago by Devon Ryan 104k

0

Entering edit mode

thank you very much, it's very clear now!

ADD REPLY • link 9.9 years ago by Martombo ★ 3.1k