Hi,
I have an annotation file for a non-model specie of Aspergillus that was generated from the best BLAST (on UNIPROT) hit for each entry. Therefore, the list of UNIPROT IDs is not restricted to one organism but several (e.g ATG12_ASPCL, UBC2_MEDSA, UBE2Z_HUMAN). I want to do pathway and gene set enrichment analysis, and for that I need to have all the transcripts identified with Entrez ID from one model organism. For now I tried stripping the specie code from the Uniprot ID, leaving the Gene Name alone (i.e ATG12, UBC2, UBE2Z). I then used the bitr function from clusterProfiler to convert IDs using org.Sc.sgd.db for S. cerevisiae or org.Hs.eg.bd for human, but 72% and 86% (respectively) of the annotated genes were not mapped.
Could anyone suggest me a tool or a strategy to solve this issue? I hope I have explained my question properly. Thanks!
EDIT - Using Retrieve/ID mapping - UniProt you can convert to Entrez IDs, but the problem of having many (non-model) species for pathway analysis remains.
By using uniprot ID converter. Some ID's may still be unmappable.
Hi, I was just adding that quote, because I already tried it. Thanks!