Question: finding differentially expressed genes in three arm treatment RNA seq data
0
gravatar for HNK
4.9 years ago by
HNK120
Netherlands
HNK120 wrote:

hello

I am interested in finding the differentially expressed genes using RNA seq data. I have the tag count file (expression file). I have used edgeR and DESeq before. the problem right now i have is that i have treatments (A, b and C) and no controls and interested to find genes that are DE between any of the groups.

What i did is using edgeR, i compared C to the average of A and B, A to the avg. of B and C and so on. And i get 3 separate list of DEG. I am not sure what i am doing is right or there is some other way to do it?

deg rna-seq 3 arm treatment • 3.2k views
ADD COMMENTlink modified 4.9 years ago by dariober10.0k • written 4.9 years ago by HNK120

What's right or wrong will depend on the question you want to answer. So base your design on that.

ADD REPLYlink written 4.9 years ago by Devon Ryan89k
4
gravatar for dariober
4.9 years ago by
dariober10.0k
WCIP | Glasgow | UK
dariober10.0k wrote:

What about this approach based on edgeRUserGuide 3.3.1? It can be easily expanded to more complicated designs...

targets<- data.frame(condition= rep(c('A', 'B', 'C'), each= 2),

    row.names= c('sampleA1', 'sampleA2', 'sampleB1',

    'sampleB2', 'sampleC1', 'sampleC2'))

# > targets
#         condition
# sampleA1         A
# sampleA2         A
# sampleB1         B
# sampleB2         B
# sampleC1         C
# sampleC2         C

matrixOfRawCounts<- matrix(rpois(n= 600, 100), ncol= 6)

Group<- as.factor(rep(c('A', 'B', 'C'), each= 2))

design <- model.matrix(~0+Group)
colnames(design) <- levels(Group)
row.names(design)<- row.names(targets)

#> design
#         A B C
# sampleA1 1 0 0
# sampleA2 1 0 0
# sampleB1 0 1 0
# sampleB2 0 1 0
# sampleC1 0 0 1
# sampleC2 0 0 1
# attr(,"assign")
# [1] 1 1 1
# attr(,"contrasts")
# attr(,"contrasts")$Group
# [1] "contr.treatment"


## Now decide your contrasts of interest
contrasts<- makeContrasts(
    A_vs_B= A - B,
    A_vs_C= A - C,
    A_vs_BC= A - (B+C)/2,
    # etc...
    levels= design
)
#       Contrasts
# Levels A_vs_B A_vs_C A_vs_BC
#      A      1      1     1.0
#      B     -1      0    -0.5
#      C      0     -1    -0.5

## Fit model
d <- DGEList(counts = matrixOfRawCounts, group = Group)
d <- calcNormFactors(d)
d<- estimateGLMCommonDisp(d, design)
d<- estimateGLMTagwiseDisp(d, design)
fit<- glmFit(d, design)

## Get DEGs
lrt<- glmLRT(fit, contrast= contrasts[, "A_vs_B"])
detable<- topTags(lrt)

lrt<- glmLRT(fit, contrast= contrasts[, "A_vs_C"])
detable<- topTags(lrt)

lrt<- glmLRT(fit, contrast= contrasts[, "A_vs_BC"])
detable<- topTags(lrt)

ADD COMMENTlink modified 4.9 years ago • written 4.9 years ago by dariober10.0k

Yup, that's a perfect example of how to use contrasts.

ADD REPLYlink written 4.9 years ago by Devon Ryan89k
2
gravatar for Martombo
4.9 years ago by
Martombo2.4k
Seville, ES
Martombo2.4k wrote:

I would suggest you not to perform an analysis with the average of two conditions. If you do so you're losing some information (for example you're not taking into account how similar is the expression in these two conditions). it would be better to group B and C together as one condition and test it against A.
anyway since you're using edgeR, you might want to try the approach described in edgeR manual under "An ANOVA-like test for any differences". That is the proper test to find a difference between "any" group as you wrote.
baySeq could also be useful to you. with that program you can define custom models for your data and test them at once.

ADD COMMENTlink written 4.9 years ago by Martombo2.4k

the command in edgeR for the ANOVA-like test is the following:
 

lrt <- glmLRT(fit, coef=2:3)

 

ADD REPLYlink modified 4.9 years ago • written 4.9 years ago by Martombo2.4k

That only makes sense if there's an intercept, which there isn't.

ADD REPLYlink written 4.9 years ago by Devon Ryan89k

the intercept term could be provided by one of the conditions right? like in the following example: 
 

>design

        (Intercept)  groupB  groupC

sample.1          1       0       0

sample.2          1       1       0

sample.3          1       0       1

 

I think that's a good way to test for "any" difference between the group conditions. but of course it would be useful to have a more detailed explanation on what are the biological questions to be answered.

ADD REPLYlink written 4.9 years ago by Martombo2.4k
2

Yes and no. Yes that makes it convenient to test for pair-wise or combined effects, but "no" in that one should be very careful in recommending that to someone unfamiliar with model matrices since it doesn't actually represent the experimental design, which is something like:

         GroupA GroupB GroupC
Sample1       1      0      0
Sample2       0      1      0
Sample3       0      0      1

It's likely a better idea to just suggest using contrasts, which will allow HNK to keep the design that matches the experiment, instead having the software rearrange the matrix for the actual comparisons (both edgeR and DESeq2 will do that for the end user). I only see this as better in that it's less likely to be misunderstood by the uninitiated (otherwise, yeah, your solution would seem quite proper).

ADD REPLYlink modified 4.9 years ago • written 4.9 years ago by Devon Ryan89k

thank you very much, it's very clear now!

ADD REPLYlink written 4.9 years ago by Martombo2.4k
0
gravatar for HNK
4.9 years ago by
HNK120
Netherlands
HNK120 wrote:

Thnaku soo much martombo and devon... I actually want to find the survival genes among all these 3 treatments.

Ok i will be using Anova like test from edgeR, but do i have to consider intercept or not.

design=model.matrix(~0+group, data=y$samples)

or

design=model.matrix(~group, data=y$samples)

Which option should i use, for me i think, what devon said, as it represents the experimental design well.

lrt <- glmLRT(fit, coef=2:3)

What does coef=2:3 means?

ADD COMMENTlink written 4.9 years ago by HNK120

Firstly, it depends on what one means by "survival genes". If I were to write that, I would mean that I have patients with some disorder who had undergone a variety of different treatments, of variable effectiveness, and that I'm looking for changes that correlate with favorable prognosis within/between treatments. We only know that you have 3, presumably non-overlapping, groups so it's difficult to give too much insight there.

What I suspect you'll actually want to do is make 3 pairwise comparisons between the various groups. The simplest way to do that is design=model.matrix(~0+group, data=y$samples) and then use some contrasts for each comparisons. There's very likely an example of this in the edgeR user guide. Then you'll have a list of AvsB, AvsC and BvsC differences, which is probably most useful.

The coef=2:3 bit makes sense if you allow for an intercept. Basically it's looking for difference between calling all the samples groupA and giving them their proper assignments. This is effectively how a 1-way ANOVA works, though it's debateable how biologically relevant the results would be (perhaps it's very informative, perhaps not, it depends on the exact nature of the study). I should note that if you do allow for an intercept, then coef=2 would be the genes DE in groupB vs groupA and coef=3 would be those DE in groupC vs. A. The results this way should be identical to using contrasts, so at the end of the day either will work.

BTW, it's useful to know what an intercept is doing in a model matrix. Basically, it's a group that applies to all of your sample and against which all of your other groups are compared. In your case, this would be groupA. An intercept nicely matches experiments with factorial designs such as an experiment where WT and Mutant animals are treated with a drug or control. That's a classical 2x2 factorial design, where the intercept would be WT animals treated with control, since you'd be interested in changes relative to that. When you have multiple groups that you want to directly compare, then it's usually less confusing to just omit the intercept, since there's no single group against which things are being compared. You can usually doing things with/without an intercept and get the same results, but you'll be best served by not departing too much from the actual experimental design (doing so is a quick way to get confused about what the results actually mean).

ADD REPLYlink written 4.9 years ago by Devon Ryan89k
0
gravatar for HNK
4.9 years ago by
HNK120
Netherlands
HNK120 wrote:
>design=model.matrix(~0+group, data=y$samples) 
> head(design)
                A    B    C
S_1             1    0    0
S_100           1    0    0
S_102           1    0    0
S_104           1    0    0
S_105           1    0    0
S_106           1    0    0

>fit <- glmFit(y, design)
>lrt <- glmLRT(fit, coef=2:3)

I used these command.. Why am i getting the p value and FDR all zero??????

 

ADD COMMENTlink written 4.9 years ago by HNK120

Your samples are only in one group! It looks like there's a problem with y$samples.

ADD REPLYlink written 4.9 years ago by Devon Ryan89k

Oh, actually, as I mentioned in my comment above, the coef=2:3 part won't make sense without an intercept. That's likely the cause of the wonky output (I'm guessing that there are samples in each of the groups, just that they're not shown).

ADD REPLYlink written 4.9 years ago by Devon Ryan89k

oh No actually i do have samples in all the groups. its just tht i sorted them, thats why the head(design) just showing samples in one group).

Ok now i did include the intercept and used coef 2:3

> head(design)
        (Intercept)         B         C
S_1               1         0         0
S_100             1         0         0
S_102             1         0         0
S_104             1         0         0
S_105             1         0         0
S_106T1           1         0         0

 

ADD REPLYlink written 4.9 years ago by HNK120
0
gravatar for HNK
4.9 years ago by
HNK120
Netherlands
HNK120 wrote:
>topTags(lrt)
         logFC.B      logFC.C    logCPM       LR   PValue FDR
OR4F5      -26.78231  -26.02834 -2.833065 3203.629      0   0
MIR429     -26.78231  -26.07395 -2.799337 3169.653      0   0
MIR4252    -26.78231  -26.38842 -2.842051 3208.550      0   0
PRAMEF13   -26.78231  -26.35145 -2.841839 3208.086      0   0
GUCA2A     -26.78231  -25.41435 -2.741271 3125.508      0   0
DMRTB1     -26.78231  -26.04102 -2.833144 3169.287      0   0
NBPF6      -26.78231  -24.09955 -2.692791 3181.497      0   0
KCNA10     -26.78231  -25.27517 -2.774327 3123.293      0   0
LOC643441  -26.78231  -26.07788 -2.799318 3204.186      0   0
MIR320B1   -26.78231  -26.35638 -2.824551 3208.138      0   0
ADD COMMENTlink written 4.9 years ago by HNK120
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1356 users visited in the last hour