Question: key lookup changes with AnnotationDbi version
2
gravatar for james
4 months ago by
james20
james20 wrote:

I thought that all of the information about each chipset (i.e. platform) was in the corresponding R package.

For example, hgu219.db is the annotation package for the hgu219 platform.

However, my key lookup results differ depending on the package version of AnnotationDbi, even when the hgu219.db package versions are the same.

So for example,

keys(hgu219.db, keytype  = 'UNIPROT')

gives a different list of UNIPROTs depending on the AnnotationDbi version.

I thought all of the info was in the hgu219.db package. My thinking must be incorrect?

Can someone explain why this is happening?

bioconductor R • 257 views
ADD COMMENTlink modified 4 months ago by benformatics2.1k • written 4 months ago by james20

Try packageVersion("hgu219.db") to check exact version of "hgu219.db", don't guess.

ADD REPLYlink modified 4 months ago • written 4 months ago by MatthewP880

Yes, of course. This is exactly what I did. On both systems:

> packageVersion('hgu219.db')
[1] ‘3.2.3’
ADD REPLYlink written 4 months ago by james20

Can you provide the version of R, Bioconductor, hgu219.db and AnnotationDbi packages you are using on each computer/platform?

ADD REPLYlink written 4 months ago by Lluís R.990

OK, but as explained in my question, I don't see why anything besides the hgu219.db version is relevant.

Computer 1:

> R.Version()$version.string
[1] "R version 3.6.3 (2020-02-29)"
> library(BiocManager)
Bioconductor version 3.10 (BiocManager 1.30.10), ?BiocManager::install for help
Bioconductor version '3.10' is out-of-date; the current release version '3.11' is available with R version '4.0';     
see https://bioconductor.org/install
> packageVersion('hgu219.db')
[1] ‘3.2.3’
> packageVersion('AnnotationDbi')
[1] ‘1.48.0’

Computer 2:

> R.Version()$version.string
[1] "R version 3.6.2 (2019-12-12)"
> library(BiocManager)
Bioconductor version 3.9 (BiocManager 1.30.4), ?BiocManager::install for help
Bioconductor version '3.9' is out-of-date; the current release version '3.11'
is available with R version '4.0'; see https://bioconductor.org/install
> packageVersion('hgu219.db')
[1] ‘3.2.3’
> packageVersion('AnnotationDbi')
[1] ‘1.46.0’
ADD REPLYlink written 4 months ago by james20

And what's your result compare of keys? For example setdiff(computer1_keys, computer2_keys)

ADD REPLYlink written 4 months ago by MatthewP880

Sorry, but it sounds like you don't know the answer to my question?

One has a few hundred more UNIPROTs than the other.

ADD REPLYlink written 4 months ago by james20

Exactly 361 more UNIPROTs with the older version of AnnotationDbi, i.e. the new version gives a subset of the old version.

ADD REPLYlink written 4 months ago by james20

As @MatthewP observed, the new version of AnnotationDbi is dropping a few hundred UNIPROTs.

Just guessing, but is AnnotationDbi keeping a list of "stale" UNIPROTs?

ADD REPLYlink written 4 months ago by james20
1
gravatar for Lluís R.
4 months ago by
Lluís R.990
Spain, Barcelona
Lluís R.990 wrote:

You are using different Bioconductor versions between computers, which affect the underlying data for hgu219.db. For instance the hgu219.db version on computer 2 is newer than the AnnotationDbi version, which might be one of the reason for this, as hgu219.db uses data and methods provided by AnnotationDbi and other packages on Bioconductor for that release.

Check using BiocManager::valid() and follow its advice to set up a valid Bioconductor installation and use the same Bioconductor version on both machines if you want consistent results between both computers.

ADD COMMENTlink written 4 months ago by Lluís R.990
1
gravatar for benformatics
4 months ago by
benformatics2.1k
ETH Zurich
benformatics2.1k wrote:

Based on my limited investigation I'm not quite sure it's AnnotationDbi but more likely the version of the org.Hs.eg.db package you have installed. I think that the lookups are mainly going through org.Hs.eg.db to find matches through the entrezID. You can also just switch hgu219.db with org.Hs.eg.db and then output is the same. I didn't see any actual UNIPROT IDs directly in the hgu219.db package.

It seems that this is in fact a potential serious issue. I guess the reasoning for why UniprotIDs were dropped is unclear (and also which ones). I see you have already made a bioconductor post which is good.

To expand upon this I looked at the setdiff between installations - more additional IDs are lost with newer version (in my case) but not to the magnitude you expressed.

I just use one accession as an example (Q6ZP68) where it completely disappears in the newer version despite being annotated as reviewed in UNIPROT.

Computer 1: hgu219.db_3.2.3, org.Hs.eg.db_3.11.4, AnnotationDbi_1.50.3

> select(hgu219.db, keys=c("Q6ZP68"),columns=c("SYMBOL","GENENAME","ENTREZID"), keytype="UNIPROT")
Error in .testForValidKeys(x, keys, keytype, fks) :
  None of the keys entered are valid keys for 'UNIPROT'. Please use the keys method to see a listing of valid arguments.
> select(hgu219.db, keys=c("ATP11AUN"),columns=c("GENENAME","ENTREZID","UNIPROT"), keytype="SYMBOL")
'select()' returned 1:1 mapping between keys and columns
    SYMBOL                 GENENAME ENTREZID UNIPROT
1 ATP11AUN ATP11A upstream neighbor   400165    <NA>

Computer 2: hgu219.db_3.2.3, org.Hs.eg.db_3.8.2, AnnotationDbi_1.46.

> select(hgu219.db, keys=c("Q6ZP68"),columns=c("SYMBOL","GENENAME","ENTREZID"), keytype="UNIPROT")
'select()' returned 1:1 mapping between keys and columns
  UNIPROT   SYMBOL                 GENENAME ENTREZID
1  Q6ZP68 ATP11AUN ATP11A upstream neighbor   400165
> select(hgu219.db, keys=c("ATP11AUN"),columns=c("GENENAME","ENTREZID","UNIPROT"), keytype="SYMBOL")
'select()' returned 1:1 mapping between keys and columns
    SYMBOL                 GENENAME ENTREZID UNIPROT
1 ATP11AUN ATP11A upstream neighbor   400165  Q6ZP68
ADD COMMENTlink modified 4 months ago • written 4 months ago by benformatics2.1k

Official response is that the responsibility for lack of these annotations falls on NCBI. The R packages just wrap the publicly available data.

https://support.bioconductor.org/p/134782/

ADD REPLYlink written 4 months ago by benformatics2.1k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1675 users visited in the last hour
_