Question

Missing Org.hs.eg.db GO Annotations for Uniprot IDs

0

Entering edit mode

6 months ago

Charlie ▴ 10

I have run into an issue when trying to do GO enrichment using ClusterProfiler in combination with org.hs.eg.db. In this analysis, I am interested in the set of proteins (as labeled by their Uniprot ID) that are found to be differentially abundant between two experimental conditions.

In the process, I noticed that a large number of my Uniprot IDs in my set of proteins do not seem to have any information about them contained in the org.hs.eg.db database (31/87). This is confusing to me, as when I go to Uniprot's website, these protein IDs do have GO terms associated with them. Thus, it seems that these IDs do have GO data associated with them, it just seems to not be contained in the org.hs.eg.db database. I am using the most recent version of org.hs.eg.db (3.17.0).

As a couple of examples, A0A075B6H9, A0A075B6I4, and A0A0B4J1Y9 fit this pattern.

I am wondering, 1. Why is this the case? 2. What can I do about it? Any help in this area would be much appreciated! Thanks!

Org.hs.eg.db clusterProfiler Uniprot • 544 views

ADD COMMENT • link 6 months ago by Charlie ▴ 10

0

Entering edit mode

How outdated compared to UniProt annotations is org.hs.eg.db? From the latest docs:

Mappings were based on data provided by: Entrez Gene ftp://ftp.ncbi.nlm.nih.gov/gene/DATA With a date stamp from the source of: 2023-Mar05

So how outdated is this Entrez Gene info relative to UniProt?

ADD REPLY • link 6 months ago by Jean-Karim Heriche 27k

0

Entering edit mode

As for point 2, why not then use GO annotations from UniProt directly?

ADD REPLY • link 6 months ago by Jean-Karim Heriche 27k

1

Entering edit mode

Thanks for the answers! This is what I ended up doing.

In case it helps anyone else, the way I did this is below:

#Annotate uniprot IDS with GO Terms using uniprot API (https://github.com/baynec2/GLabR)
uniprot_go = GLabR::annotate_uniprot_single(ids,columns = "accession,go_id")

#Formating the go terms into a dataframe
go = uniprot_go %>% 
  dplyr::separate_rows(Gene.Ontology.IDs, sep = ';') %>% 
  dplyr::mutate(Gene.Ontology.IDs = gsub(" ","",Gene.Ontology.IDs))

#Making TERM2GENE dataframe to use with enricher
TERM2GENE = go %>% 
  dplyr::select(TERM = Gene.Ontology.IDs,GENE = Entry) 

#Getting the names for each GO term to use with enricher
TERM2NAME = AnnotationDbi::select(GO.db::GO.db,keys = unique(TERM2GENE$TERM),columns = c("TERM"))

#doing the enrichment analysis
enrich = clusterProfiler::enricher(up,
         universe = go$Entry,
         TERM2GENE = TERM2GENE,
         TERM2NAME = TERM2NAME)

#Making a network plot
plot = cnetplot(enrich, categorySize="pvalue")

plot

ADD REPLY • link 6 months ago by Charlie ▴ 10