Question

AnnotationDbi returns different list of symbols from directly derived list of database itself

1

Entering edit mode

5.0 years ago

zephyr_falcon ▴ 80

Hi.

I'm trying to annotate gene symbols next to probe IDs (Affymetrix Mouse Gene 1.0-ST Array).

I used "mogene10sttranscriptcluster.db" package (v8.7.0) of R for the annotation.

But here's the problem.

1) Using mogene10sttranscriptcluster.db directly

library(mogene10sttranscriptcluster.db)

a <- contents(mogene10sttranscriptclusterSYMBOL)

# a$'10344741'
# [1] NA

2) Using AnnotationDbi to extract the info

library(mogene10sttranscriptcluster.db)
library(AnnotationDbi)

k <- keys(mogene10sttranscriptcluster.db, keytype = "PROBEID")
b <- mapIds(mogene10sttranscriptcluster.db, keys=k, column=c("SYMBOL"), keytype="PROBEID")
b["10344741"]

# 10344741
# "Hnrnpa3" 

length(a) = length(b) = 35556

But there are some symbols not in the (1) but in the (2).

They both used the same database - mogene10sttranscriptcluster.db, but how did they get different results?

Does the AnnotationDbi converts probe ids to some other ids and then convert them to gene symbols?

The second one seems to have more symbols, so that's the one I have to use?

I'm very confused right now.

gene R AnnotationDbi mapIds • 1.4k views

ADD COMMENT • link updated 4.9 years ago by zx8754 11k • written 5.0 years ago by zephyr_falcon ▴ 80

score 1 · Accepted Answer · 2019-06-04

I found my own answer.

It seems like the mogene10sttranscriptcluster.db utilizes org.Hs.eg.db for annotation.

And the version of org.Hs.eg.db is different between mogene10sttranscriptcluster.db and AnnotationDbi.

I found this because when I loaded different version of org.Hs.eg.db, the same version of mogene10sttranscriptcluster.db (v8.7.0) produces different results.

So, check the version of your org.Hs.eg.db.