Hi, I am trying to do GO&KEGG enrichment analysis using R package, clusterProfiler. I changed gene IDs (ENSEMBL→uniprot) by "bitr" functions for KEGG enrichment analysis. However, "bitr" returned the multiple IDs from single gene sometimes (I changed into ENTREZ id at the same time). I should pick up one ID from multiple IDs returned from single gene, for enrichment analysis, I thought. So my question is ②How people select the appropriate IDs from multiple returns. I need to do it manually by confirming each returned IDs using uniprot website ? (Ex. judging from the annotation score) but this is so hard working. How everyone deal with this problem ?? (Or we don't need to pick up one from single gene in the first place ...?)
Follow the detailed workflow here : https://github.com/twbattaglia/RNAseq-workflow :
# Add ENTREZ ID results$entrez <- mapIds(x = org.Mm.eg.db, keys = row.names(results), column = "ENTREZID", keytype = "SYMBOL", multiVals = "first")
For starters, don't bother using ENSEMBL to UniProt. In the guide, the user has set
multiVals = "first'
Which means: "This value means that when there are multiple matches only the 1st thing that comes back will be returned. This is the default behavior." I have seen this used quite a lot in workflows, so assumed it is ok. If you want to set it to something else, check out the MultiVals argument here: https://www.rdocumentation.org/packages/AnnotationDbi/versions/1.30.1/topics/AnnotationDb-objects
(EDIT): When you get a handle of that workflow, move to this one: https://yulab-smu.github.io/clusterProfiler-book/chapter12.html