Question: key lookup changes with AnnotationDbi version
2
gravatar for james
14 days ago by
james20
james20 wrote:

I thought that all of the information about each chipset (i.e. platform) was in the corresponding R package.

For example, hgu219.db is the annotation package for the hgu219 platform.

However, my key lookup results differ depending on the package version of AnnotationDbi, even when the hgu219.db package versions are the same.

So for example,

keys(hgu219.db, keytype  = 'UNIPROT')

gives a different list of UNIPROTs depending on the AnnotationDbi version.

I thought all of the info was in the hgu219.db package. My thinking must be incorrect?

Can someone explain why this is happening?

bioconductor R • 127 views
ADD COMMENTlink modified 13 days ago by benformatics2.0k • written 14 days ago by james20

Try packageVersion("hgu219.db") to check exact version of "hgu219.db", don't guess.

ADD REPLYlink modified 14 days ago • written 14 days ago by MatthewP780

Yes, of course. This is exactly what I did. On both systems:

> packageVersion('hgu219.db')
[1] ‘3.2.3’
ADD REPLYlink written 14 days ago by james20

Can you provide the version of R, Bioconductor, hgu219.db and AnnotationDbi packages you are using on each computer/platform?

ADD REPLYlink written 14 days ago by Lluís R.970

OK, but as explained in my question, I don't see why anything besides the hgu219.db version is relevant.

Computer 1:

> R.Version()$version.string
[1] "R version 3.6.3 (2020-02-29)"
> library(BiocManager)
Bioconductor version 3.10 (BiocManager 1.30.10), ?BiocManager::install for help
Bioconductor version '3.10' is out-of-date; the current release version '3.11' is available with R version '4.0';     
see https://bioconductor.org/install
> packageVersion('hgu219.db')
[1] ‘3.2.3’
> packageVersion('AnnotationDbi')
[1] ‘1.48.0’

Computer 2:

> R.Version()$version.string
[1] "R version 3.6.2 (2019-12-12)"
> library(BiocManager)
Bioconductor version 3.9 (BiocManager 1.30.4), ?BiocManager::install for help
Bioconductor version '3.9' is out-of-date; the current release version '3.11'
is available with R version '4.0'; see https://bioconductor.org/install
> packageVersion('hgu219.db')
[1] ‘3.2.3’
> packageVersion('AnnotationDbi')
[1] ‘1.46.0’
ADD REPLYlink written 14 days ago by james20

And what's your result compare of keys? For example setdiff(computer1_keys, computer2_keys)

ADD REPLYlink written 14 days ago by MatthewP780

Sorry, but it sounds like you don't know the answer to my question?

One has a few hundred more UNIPROTs than the other.

ADD REPLYlink written 13 days ago by james20

Exactly 361 more UNIPROTs with the older version of AnnotationDbi, i.e. the new version gives a subset of the old version.

ADD REPLYlink written 13 days ago by james20

As @MatthewP observed, the new version of AnnotationDbi is dropping a few hundred UNIPROTs.

Just guessing, but is AnnotationDbi keeping a list of "stale" UNIPROTs?

ADD REPLYlink written 13 days ago by james20
1
gravatar for Lluís R.
13 days ago by
Lluís R.970
Spain, Barcelona
Lluís R.970 wrote:

You are using different Bioconductor versions between computers, which affect the underlying data for hgu219.db. For instance the hgu219.db version on computer 2 is newer than the AnnotationDbi version, which might be one of the reason for this, as hgu219.db uses data and methods provided by AnnotationDbi and other packages on Bioconductor for that release.

Check using BiocManager::valid() and follow its advice to set up a valid Bioconductor installation and use the same Bioconductor version on both machines if you want consistent results between both computers.

ADD COMMENTlink written 13 days ago by Lluís R.970
1
gravatar for benformatics
13 days ago by
benformatics2.0k
ETH Zurich
benformatics2.0k wrote:

Based on my limited investigation I'm not quite sure it's AnnotationDbi but more likely the version of the org.Hs.eg.db package you have installed. I think that the lookups are mainly going through org.Hs.eg.db to find matches through the entrezID. You can also just switch hgu219.db with org.Hs.eg.db and then output is the same. I didn't see any actual UNIPROT IDs directly in the hgu219.db package.

It seems that this is in fact a potential serious issue. I guess the reasoning for why UniprotIDs were dropped is unclear (and also which ones). I see you have already made a bioconductor post which is good.

To expand upon this I looked at the setdiff between installations - more additional IDs are lost with newer version (in my case) but not to the magnitude you expressed.

I just use one accession as an example (Q6ZP68) where it completely disappears in the newer version despite being annotated as reviewed in UNIPROT.

Computer 1: hgu219.db_3.2.3, org.Hs.eg.db_3.11.4, AnnotationDbi_1.50.3

> select(hgu219.db, keys=c("Q6ZP68"),columns=c("SYMBOL","GENENAME","ENTREZID"), keytype="UNIPROT")
Error in .testForValidKeys(x, keys, keytype, fks) :
  None of the keys entered are valid keys for 'UNIPROT'. Please use the keys method to see a listing of valid arguments.
> select(hgu219.db, keys=c("ATP11AUN"),columns=c("GENENAME","ENTREZID","UNIPROT"), keytype="SYMBOL")
'select()' returned 1:1 mapping between keys and columns
    SYMBOL                 GENENAME ENTREZID UNIPROT
1 ATP11AUN ATP11A upstream neighbor   400165    <NA>

Computer 2: hgu219.db_3.2.3, org.Hs.eg.db_3.8.2, AnnotationDbi_1.46.

> select(hgu219.db, keys=c("Q6ZP68"),columns=c("SYMBOL","GENENAME","ENTREZID"), keytype="UNIPROT")
'select()' returned 1:1 mapping between keys and columns
  UNIPROT   SYMBOL                 GENENAME ENTREZID
1  Q6ZP68 ATP11AUN ATP11A upstream neighbor   400165
> select(hgu219.db, keys=c("ATP11AUN"),columns=c("GENENAME","ENTREZID","UNIPROT"), keytype="SYMBOL")
'select()' returned 1:1 mapping between keys and columns
    SYMBOL                 GENENAME ENTREZID UNIPROT
1 ATP11AUN ATP11A upstream neighbor   400165  Q6ZP68
ADD COMMENTlink modified 13 days ago • written 13 days ago by benformatics2.0k

Official response is that the responsibility for lack of these annotations falls on NCBI. The R packages just wrap the publicly available data.

https://support.bioconductor.org/p/134782/

ADD REPLYlink written 10 days ago by benformatics2.0k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1908 users visited in the last hour