Include DEGs with a -l2fc in overrepresentation analysis?
2
0
Entering edit mode
3.2 years ago

Hello --

Hoping to gain some clarity on this topic... I understand overrepresentation analysis (ORA) to not take l2fc input and instead assess for pathway overrepresentation given a set of DEGs. In my head, it would make the most sense to filter out genes with -l2fc values (for a specific group comparison) ahead of ORA to preclude any possibility that an ontology is erroneously deemed "overrepresented" by cause of its geneset being differentially under-expressed but differentially expressed nonetheless...

Thank you in advance for any insight provided!

gene log Overrepresentation fold set DEGs changes analysis • 1.6k views
ADD COMMENT
0
Entering edit mode

There is no rule for this. Consider the example where a process such as autophagy was enhanced in cells. You would expect ATG genes going up and negative regulators such as BCL2 going down. Here it would make sense to include both, but in reality it is more convenient in terms of interpretation to separate by sign of fold change. See whether you get some hypothesis going with the separate analysis, if not include more genes.

ADD REPLY
1
Entering edit mode
3.2 years ago
Gordon Smyth ★ 8.3k

Personally I almost always separate DE genes by direction of change when doing ORAs, and this is done automatically by the goana and kegga functions in limma and edgeR when applied to fitted model objects. I find it hard to interpret the results if a separation is not done. See for example the Gene Ontology and KEGG analyses here:

There do arise situations when doing an ORA for all DE genes regardless of direction makes more sense, but in my research I find that such situations are less common. Many GO terms contain genes that are both positively and negatively correlated with the relevant biological process, so the same GO term might well be enriched in both the up-regulated and down-regulated DE genes for the same comparison.

ADD COMMENT
1
Entering edit mode
3.2 years ago
ayy ▴ 10

i dont think you should conflate "overrepresentation" with "upregulation" or "downregulation". "overrepresentation" simply means it's strange that these genes are not behaving "normally", and so you should be running ORA with all the information.

Gordon's reply is a good strategy - this that is definitely one way to determine in upregulated pathway versus a downregulated pathway. although one potential thing you'd miss out on here is that sometime the upregulation of one pathway involves the downregulation of another, and that information could be lost if you run ORA on simply the up- or down-regulated genes.

i think that a lot of these pathway analyses should be run a number of different ways to arrive at some "biological truth". in that i think you should run both looking at all of the overepresented genes, then the upregulated genes, and then the downregulated genes, and altogether you may have a better understanding of what is going on. additionally, annotation for these pathways are highly contextual and may contain a lot of overlap - so i also recommend looking at which of your genes have mapped to which pathways, and determine if that pathway actually applies to your experiment or not.

ADD COMMENT

Login before adding your answer.

Traffic: 2289 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6