Question: edgeR again: can you AND multiple conditions for analysis?
1
gravatar for moxu
24 months ago by
moxu440
moxu440 wrote:

I know in edgeR, in multivariate analysis, you can do "ceof=3:5" to test and get p-values for genes that are differentially expressed under ANY of the conditions specified by "coef=3:5". My understanding is that "ANY" here means OR? Meaning if a gene is differentially expressed in either condition 3 or 4 or 5 being compared to the control (constant term), then it would be shown significant as indicated by its p-value. How if I want to find DEGs that are expressed in 3 AND 4 AND 5? How to setup the parameters? I know you can make condition 3 & 4 & 5 to be one condition and test for it, but this is a little bit cumbersome. If you can use something like "coef=3:5" on the fly, it would awesome.

Thanks in advance!

rna-seq R • 1.3k views
ADD COMMENTlink modified 24 months ago by Devon Ryan90k • written 24 months ago by moxu440

By any chance, are you talking about interaction terms?

ADD REPLYlink written 24 months ago by cpad011211k

Not interaction terms. For instance, you treat a cell line with a compound of concentration 100nM, 200nM, 500nM. You want to find out if a gene 1) responds to any of the compound treatment; 2) responds to all of the compound treatment.

Does this make sense to you?

ADD REPLYlink written 24 months ago by moxu440

so it seems like you are expecting to make contrast specific design. In that case if you have already properly created your design matrix then it is not a problem to do the contrast specific DE test with edgeR and also do the entire estimation as well. Take a look at the links below.

link1

If you start looking at the edgeR manual from page 29, you will see that the exactTest() can make the classical pairwise test between your groups of interest If you intend to use the glm based approach then the successive tests are pretty well defined in the manual of edgeR from page 30 and how to set the contrast between particular groups as in case 3 vs 4,, 4 vs 5 , etc The key is to prepare the model.matrix() properly with the proper design of groups/levels and use contrast specific levels with glm-based approach. It should not be a problem to tackle it.

I can only say this much unless I see the design of your model matrix , it is difficult to address about the specific contrasts that you intend to set.

This link is pretty informative as well but I will still advice for manual

ADD REPLYlink written 24 months ago by ivivek_ngs4.8k
2
gravatar for Devon Ryan
24 months ago by
Devon Ryan90k
Freiburg, Germany
Devon Ryan90k wrote:

There's a bit of a misunderstanding here regarding what specifying multiple coefficients is actually testing. I have to admit that this is not terribly well spelled out in help(glmFit), although if you know how an LRT works and realize that you're feeding the coef vector to a function called glmLRT you can likely guess. To correct the misunderstanding, see the actual relevant code here.

In short, specifying coef=c(3:5) is not testing for a logical AND or an OR of DE genes in those coefficients. Rather, it's testing for genes for which the combined effect of c(3:5) is significant. In other words, it doesn't have to be the case that any individual coefficient is significant, just their combination. If you really want an AND operation, then test each of the coefficients, choose a more lax adjusted p-value cutoff and intersect the results. The equivalent for an OR operation would be the union of the results (you wouldn't need to loosen the p-value cutoff in that case).

As an aside, making c(3:5) a single group will produce different results still, since if those groups are very different from each other then the resulting coefficient will have large variance and be less likely to be picked up as significant.

ADD COMMENTlink written 24 months ago by Devon Ryan90k

In the newest (April 2017 edition) user's guide:

3.3.3 Treatment eects over all times
The nested interaction model makes it easy to nd genes that respond to the treatment at
any time, in a single test. Continuing the above example,
> lrt <- glmLRT(fit, coef=c(4,6))
nds genes that respond to the treatment at either 1 hour or 2 hours versus the 0 hour
baseline. This is analogous to an ANOVA F-test for a normal linear model.
36

So c(4,6) is testing either xx or xx.

ADD REPLYlink written 24 months ago by moxu440

The user guide is wrong.

ADD REPLYlink written 24 months ago by Devon Ryan90k

OK, maybe I see what you mean. Are you saying that c(3:5) actually treats 3, 4, 5 as one group (e.g. treated) vs the control (e.g. untreated), and each of 3, 4, or 5 is undistinguishable from each other? Or in technical terms, after normalization, 3, 4, and 5 are treated somewhat like replicates of their combined group? The mean/var of the combined group is used to compare with the control? If so, if you want to get, say, logFC of the combined group vs. the control, can you do it within edgeR?

Thanks much.

ADD REPLYlink written 24 months ago by moxu440

The nuts and bolts are that the comparison you're making for each gene is between two models, one with and the other without those three groups (i.e., with the samples but you're omitting those 3 coefficients from the model). So in the reduced model they're lumped together (or however else would be dictated by the model matrix), while in the full model they're treated as separate coefficients (i.e., the same as all of the other coefficients). The p-value is then from the difference in fit between the two models. It appears from the code at least that the logFC is kept per-coefficient, so I presume you still see all 3. That is useful for asking questions like, "might there be an effect of drug administration (ignoring concentration)" if you were treating with a drug at multiple concentrations. Since the individual coefficients aren't being tested, you can't conclude that at least one is significant, though the odds are decent that in practice one will be. You also can't conclude that all of the coefficients will be significant, since that's certainly not needed for a significant p-value.

ADD REPLYlink modified 23 months ago • written 23 months ago by Devon Ryan90k

I guess the question can be simplified as whether the test of coef=3:5 uses F test and 1) treats condition 3, 4, and 5 as one group, and the control as the other group -- total 2 groups; or 2) treats condition 3, 4, and 5 as separate groups, and analyzed together with the control group -- total 4 groups.

It should be the 1st, I guess? For at least the 2nd can be comparing among condition 3, 4, and 5.

ADD REPLYlink written 23 months ago by moxu440

As I just tested a moment ago, edgeR does not treat condition 3, 4, & 5 as one group because the p-values generated by grouping 3, 4 & 5 together in the design are very different from those generated by specifying coef=3:5.

Now I am confused.

ADD REPLYlink written 23 months ago by moxu440

It's doing exactly what I wrote in my last comment. That is, it's doing a likelihood ratio test between two models. This explanation cannot be simplified into any other terms.

ADD REPLYlink written 23 months ago by Devon Ryan90k
1

I could be very wrong -- I used "glmQLFTest" instead of "glmLRT". In one of the previous edgeR discussions on biostars (https://www.biostars.org/p/247174/#247200), someone pointed out that one should normally use glmQLFTest instead of glmLRT unless there are enough replicates. Otherwise glmLRT would generate more false positives.

Thanks a lot.

ADD REPLYlink written 23 months ago by moxu440

Ah yes, glmQLFTest is preferred, thanks for reminding me (it shows that I generally use DESeq2)! Anyway, the first thing that does is run glmLRT, so the underlying model comparison is the same, though how the p-value is generated isn't.

ADD REPLYlink written 23 months ago by Devon Ryan90k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 998 users visited in the last hour