Entering edit mode
14 months ago
Manuel Sokolov Ravasqueira
▴
110
Regarding Over Representation analysis over identification of DGE genes, firstly I have a set of 1550 genes resulting of RNA Seq. Given this number decided to do ORA instead of GSEA.
After choosing ORA, I have to pass to enrichGO a filtered list according to the fold change:
genes <- names(gene_list[abs(gene_list)> 2])
go_enrich <- enrichGO(gene = genes,
universe = gene_list,
OrgDb = org.Hs.eg.db,
keyType="SYMBOL",
ont = "ALL",
pAdjustMethod = "fdr",
pvalueCutoff = 0.01,
qvalueCutoff = 0.05,
readable = TRUE)
In this situation the threshold is > 2 however some researchers use value of 1. What is the optimal solution for accurate results? What are the best practices for deciding this number?
Best Regards
Personally, since we are looking for a statistical enrichment, I tend not to use no, or only a very mild lfc filter (e.g. 0.2-0.5) if the purpose of the gene list is enrichment analysis. If use a different filter of the DE genes were the final product of the analysis themselves
this is a really great point i didnt think about the first time i read through this.
i agree - in this case - at least at first - you'd be well advised not to filter. you can bring one in later on if for some reason it seems it could help
there is no correct answer to this question, apart from what is meaningful to the experimenter. generally, if there are results published by others on the same phenotype, you could try a variety of values between 1 and 2, and see what recapitulates results you trust published by others...
one could also say, there is no correct answer, there is just what maximizes statistical power, but in this case without knowing more, saying what would maximize statistical power is likewise inaccessible.