I need your help since I am very much in doubt about the right approach for gene enrichment analysis.
For my non-model organism, I have a background list of official gene IDs (originating from a Trinotate annotated de novo transcriptome, which also assigned GO-terms to each gene) and a list of differentially expressed genes (official gene IDs) that I want to do a functional enrichment analysis with.
My first approach was using the R-package clusterProfiler since I have the GO-terms from the Trinotate annotation. But the statistics seemed off (FDR, Bonferroni, Benjamini etc. all came out with the same values).
My current approach is DAVID - but I am very much in doubt about what is the "right thing" to do!
The background gene list and list of differentially expressed genes are converted from official gene IDs into Entrez IDs. But each gene is assigned multiple Entrez IDs.
I have tried running the full converted list with multiple Entrez IDs per gene - and I have removed "duplicates" and run the analysis with only one representative Entrez ID per gene. The result is very different both in the number enrichment results and GO terms.
I would very much like to hear what are your approach to DAVID - would you remove "duplicates" (e.g. only use one representative Entrez ID per gene symbol) or use the full converted list?
Best wishes, Birgitte