Question

How to make contrasts to find genes DE in one experimental group but NOT in others?

1

Entering edit mode

16 months ago

bkleiboeker ▴ 370

Hi all, I have an RNAseq dataset I am analyzing of gene expression from patients with a variety of illnesses. The design matrix is generally as follows:

> patient <- 1:12
> illness <- c(rep("control",3),rep("viral",3),rep("bacterial",3),rep("bactrial_and_viral",3))
> design <- model.matrix(~0+illness)
> rownames(design) <- patient

I want to find genes which are DE only in the bacterial group (relative to control), genes which are DE only in the viral group (relative to control), and genes which are DE only in the combined bacterial+viral group (relative to control). I am having trouble deciding which contrasts are appropriate to do this.

I know that if i just wanted to find genes which change in bacterial but not viral I could do:

> makeContrasts(bacterial_only = (bacterial-control)-(viral-control), levels = design)

But I think this would not exclude genes which also change in the bacterial_and_viral condition.

I cannot find the appropriate way to do the analogous comparison with more than 1 other experimental group. Could I do the following?

> makeContrasts(bacterial_only = (bacterial-control) - ( (viral-control) + (bacterial_and_viral-control) )/2, levels = design)

I'd really appreciate any insight on how to find genes which change in only one experimental group relative to control. I'm also worried I might be overthinking this--could this be accomplished by simply:

> makeContrasts (bacterial_only = bacterial-control, levels = design)

Does this find genes only DE in bacterial and not in other experimental groups?

Thanks for any insight you all can provide!

Brian

edgeR limma matrix design R • 1.9k views

ADD COMMENT • link updated 15 months ago by Gordon Smyth ★ 7.0k • written 16 months ago by bkleiboeker ▴ 370

1

Entering edit mode

why don't you do each experimental group separately and then subtract the genes which are common with the others?

ADD REPLY • link 16 months ago by asalimih ▴ 60

0

Entering edit mode

That's a good idea, I hadn't thought of that. I guess I was thinking that since there was a good way to compare the interaction across two experimental groups as in:

> makeContrasts(bacterial_only = (bacterial-control)-(viral-control), levels = design)

There might be a way to do it for 3 or more, but your idea would certainly be a good alternative. Thanks!

ADD REPLY • link 16 months ago by bkleiboeker ▴ 370

score 4 · Accepted Answer · 2022-12-25

4

Entering edit mode

16 months ago

Gordon Smyth ★ 7.0k

I know that if i just wanted to find genes which change in bacterial but not viral I could do: makeContrasts(bacterial_only = (bacterial-control)-(viral-control), levels = design)

That's not what the contrast does. The interaction contrast finds all genes for which the bacterial effect is significantly different to the viral effect. There's no assumption that a viral effect is absent. There might be strong effects in both bacteria and virus but in different directions. Or the effect could be just in bacteria. Or the effect could be just in virus. Or the effects in bacteria and virus could be in the same direction, but much bigger for one than the other.

You actually have only one control in this experiment, rather than separate controls for bacteria and virus. So the contrast

(bacterial-control)-(viral-control)

is exactly the same as

bacterial-viral

because the control cancels out. As you can see, the contrast just gives a differential expression test between bacterial and viral without regard to the control.

ADD COMMENT • link 16 months ago by Gordon Smyth ★ 7.0k

0

Entering edit mode

I think I understand the interaction contrast better now, thank you very much for clearing that up. I was actually playing around with contrasts and examining the contrast matrix itself across the variety of contrasts and your explanation helps me understand what I was seeing.

Do you think what I am describing is possible through the appropriate contrasts--finding genes DE in one and only one experimental condition all relative to the same control? Or would I be best served by following the other commenter's suggestion of doing the pairwise comparisons and subtracting common genes after the fact?

Thanks so much for your help and excellent explanations!

ADD REPLY • link 16 months ago by bkleiboeker ▴ 370

1

Entering edit mode

The way to do this is to compare each of the treatment groups to the control as three separate contrasts. Then run decideTests() and select rows that have a significant result for the contrast of interest and not for the other two.

ADD REPLY • link 16 months ago by Gordon Smyth ★ 7.0k

1

Entering edit mode

Interesting, would this not be making the assumption that not significant = not changed? Or have I missed something about decideTests?

I around have thought it better to look at the bacterial-viruses for significance, and then use the effect sizes from the vs control contrasts to group into the different possibilities for what a significant value in the bacteria vs virus meant.

ADD REPLY • link 15 months ago by i.sudbery 19k

3

Entering edit mode

decideTests() includes a "nestedF" option that really does attempt to classify contrasts as DE or not rather than using statistical significance.

However, if I want to identify genes that are DE for one contrast and not others, I generally prefer to use regular statistical significance but with two different signficance cutoffs, a rigorous cutoff for the contrast of interest (e.g., FDR < 0.05) and a lenient cutoff for the other contrasts (e.g., unadjusted P > 0.05). This approach avoids counting a gene as non-DE when its FDR just fails to reach the rigorous cutoff. It can be implemented by running decideTests twice.

We are not assuming that not signficant = not changed. As always in statistics, not signficantly DE simply means not enough evidence of a change. However, if the experiment has reasonable power, and a gene fails to reach a reasonable significance level, then there is a good chance that the true change isn't big enough to be important.

I around have thought it better to look at the bacterial-viruses for significance

That's what we're doing.

then use the effect sizes from the vs control contrasts

That's a downstream analysis. I don't see that as a practical analysis at the initial screening stage. And if you want to be rigorous, you would need to account for the statistical correlations between the vs-control contrasts, which in this case are substantial. E.g., by using nestedF or by comparing bacterial_and_viral to bacterial and viral directly.

ADD REPLY • link 15 months ago by Gordon Smyth ★ 7.0k

0

Entering edit mode

Thanks so much, I did not know about the decideTests() function but that accomplishes exactly what I want!

ADD REPLY • link 16 months ago by bkleiboeker ▴ 370