Fold change cut off enrichment analysis
1
1
Entering edit mode
5.5 years ago
Pietro ▴ 240

I would like to hear your opinion about this approach.

I am doing over-representation analysis on differentially expressed genes from RNA-Seq.

Instead of doing a single test, let's say, for up-regulated genes with a log2FC cut off of x, I am doing different test within different cut off intervals. For example, one test for genes up with log2FC between 0 and 0.5, then one for the genes up with log2FC between 0.5 and 1 and so on. And then separately for negative fold changes.

Later I do not mix results but I just compare the terms found enriched with different cut off intervals.

This way I check if genes that are significantly up or down regulated with similar intensity are enriched for specific terms/ontologies that might not be spotted if considering a single cut off value.

Hope I was clear

Pietro

RNA-Seq enrichment ontology fold change cutoff • 5.2k views
ADD COMMENT
1
Entering edit mode
5.5 years ago
shawn.w.foley ★ 1.3k

It sounds like you're looking for gene sets that are consistently up/down regulated, in which case a GSEA might be a more appropriate analysis. GSEA doesn't apply any arbitrary cutoffs, in fact it takes all expressed genes as an input regardless of fold change. It then tests for consistent changes in gene expression (or whatever metric you're using).

If you're interested in testing different cutoffs for GO term enrichment, I would lean more towards testing various lower limit cutoffs instead of both lower and upper limit. Take everything with log2FC > 0.58 (1.5-fold change), then everything >1 (2-fold change), etc. I think that would make more sense than arbitrarily choosing both an upper and lower limit.

The GO enrichment and GSEA analyses would then be complementary, testing different aspects of your differentially expressed genes.

ADD COMMENT
0
Entering edit mode

Hi Shawn, thanks for your answer.

I know what GSEA is but I am not sure it can answer my questions for this particular hypothesis.

My question was more like "Do genes that are significantly up regulated within a specific FC interval show enrichment for some categories/ontologies/terms, compared to genes significantly up regulated within different FC intervals?".

Using random numbers and imagination for an example, let's say I have 70 genes that are significantly up regulated within a FC interval 1 to 1.5. Of these, 50 genes belong to ontology A. Within my universe/background (~ 15000 genes), there are 90 genes belonging to ontology A. Gene ontology enrichment testing of these 70 genes results in a very significant value for ontology A. Then I take all the significantly up regulated genes that have a FC > 1, which are, let's say, 1000 genes. Gene ontology enrichment testing using these 1000 genes results in no significance for ontology A.

I would like to understand if this makes sense from a biological perspective.

ADD REPLY
0
Entering edit mode

So it sounds like you're concerned about losing true signal by looking at too large a group of genes. My concern would be false positives/negatives based on the arbitrary nature of the cutoffs. I'd still argue in favor of GSEA over binning the genes by log2FC cutoffs.

In your example case you have 50/90 genes from ontology A that are between 1 and 1.5-fold upregulated. If this were the case then both your modified GO term enrichment and GSEA would find significance. A better question would be if you're still interested in this result if the other 40 genes are between 1 and 1.5-fold downregulated. From the method you've described, an ontology where genes are both up and down would be found significant in both analyses, whereas with GSEA it would not be significant.

One work around would be to filter out gene ontologies that are both up- and down-regulated, but is that fair? If you have an ontology that is 5-10 fold upregulated, and 0.5-1.0 fold downregulated, would you call that ontology upregulated or unchanged? GSEA would probably call that upregulated, but then what cutoffs are appropriate? What about 20-fold upregulated and 5-fold downregulated? Is that significant/meaningful?

I think you're asking an interesting question regarding ontologies within fold change intervals, I'd just be worried that the arbitrary nature of the cutoffs can lead to some false positives/negatives. If this is simply an in-house analysis for filtering or discovery work, then I don't see much harm. But if you're hoping to publish this analysis or downstream findings it might be difficult to defend this method to reviewers. GSEA using gene ontologies has its own drawbacks, but it's well established and accepted in the field, I can't imagine a reviewer giving it too much difficulty.

ADD REPLY
0
Entering edit mode

Arbitrary cut-offs can be misleading, therefore one also needs joined neighboring intervals in such a case to cover the maximum range of values (low to high expression) and then perform enrichment analysis. In my opinion, binning by Log2FC values certainly adds more relevance to the whole analysis by bringing the expression values in picture.

ADD REPLY

Login before adding your answer.

Traffic: 786 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6