ORA analysis (over-representation analysis) : different package different padjusted and qvalue
Entering edit mode
9 weeks ago
camillab. ▴ 130


Apologies for the stupid question! but I think I am doing something wrong but i do not understand what. I would like to do ORA analysis on bulk-RNAseq dataset so I tried both clusterProfiler and also genekitr.` However, despite getting the same terms, but I have different p-adjusted value and q-value (practically with clusterprofiler none of the term have a p.adjusted or value <= 0.01 whereas wit the genekitr I have few). why is that? Do I do something wrong with my code?

for clusterProfiler:

# we want the log2 fold change 
original_gene_list <- d$log2FC # on the unfiltered dataset

# name the vector
names(original_gene_list) <- d$ENSEMBL

# omit any NA values 

# sort the list in decreasing order (required for clusterProfiler)
gene_list = sort(gene_list, decreasing = TRUE)

# Exctract significant results (padj < 0.05)
sig_genes_df = subset(d, p_value <= 0.05)

# From significant results, we want to filter on log2fold change
genes <- sig_genes_df$log2FC

# Name the vector
names(genes) <- sig_genes_df$ENSEMBL

# omit NA values
genes <- na.omit(genes)

# filter on min log2fold change (log2FoldChange > 1.5)
genes <- names(genes)[abs(genes) > 1.5]

go_enrich <- enrichGO(gene = genes,
                      universe = names(gene_list),
                      OrgDb = org.Hs.eg.db, 
                      keyType = "ENSEMBL",
                      readable = T,
                      ont = "BP",
                      pvalueCutoff = 0.05, 
                      qvalueCutoff = 0.01)

and for genekitr i have used this code (section 1.7 :

# 1st step: get input IDs
id <- c(dpg6$Associated.Gene.Name) # DEGs

# 2nd step: get gene set 
gs2 <- geneset::getGO(org = "human",ont = "bp") # biological process

ego2 <- genORA(id,
               geneset = gs2,
               universe = names (d$ENSEMBL), # bakground aka dataset unfiltered
               p_cutoff = 0.05,
               q_cutoff = 0.01) # bp

What I am doing wrong?

Thank you very much for your help!


r p-value ORA q-value • 614 views
Entering edit mode
9 weeks ago
chaco001 ▴ 40

This could be due to a few different things.

  1. It isn't completely clear from your example whether id and genes are the same list, which they would need to be to expect the same results.

  2. Similarly, it seems like the universes given are slightly different, which affects the hypergeometric test.

  3. It could be that the GO-BP databases are different versions.

  4. Finally, the docs for kitr (while a bit confusingly written) also show that the two approaches yield different results. I'm not sure I'm parsing their explanation fully but it seems to be due to a slight difference in the genes used for the test. https://www.genekitr.fun/ora-analysis-1.html#ora-tools-comparsion

Unrelated, while I have some clients that ask for ORA, I strongly prefer GSEA, because I don't have to do things like choose thresholds. Good luck!

Entering edit mode

Thank you! the universe and the genes are, I just used different names because the scripts were written in different times! However I re-run both codes using the same gene/names and the results is the same as before (different p and q values). How do I choose which method? I don`t want to choose genekitr just because it gives me more terms statistically significant that would match my theory if it is not the right approach!

Entering edit mode

Hi, I'm the author of genekitr. Thanks for your feedback. Regarding your question, firstly, both enrichGO and genORA are based on the enricher function for statistical calculations. As @chaco001 said, the main difference lies in the input annotation of terms used, which of course is not limited to GO. ClusterProfiler mainly adopts the OrgDb method, for example, the function uses org.Hs.eg.db to obtain geneset, while genekitr integrates Panther db (v17.0) and OrgDb.

Entering edit mode

I love using these tools! They are both easy to use.

Entering edit mode
9 weeks ago

Don't worry about it, those p-value and q-values in these tools are mostly "make-believe".

The problems with ORA analyses are so profound and fundamental that the p-values are almost meaningless.

Think of them as educated guesses and opinions.


Login before adding your answer.

Traffic: 2273 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6