Question: Difference between GO.db, biomaRt, and org.Hs.eg.db in GO annotations
1
gravatar for lihaone
11 months ago by
lihaone10
lihaone10 wrote:

There are many R packages from where GO annotations can be retrieved, for example, GO.db, biomaRt, and org.Hs.eg.db. Are there any differences between the annotations obtained from these resources? Which one is the best in terms of update frequency and easy to use?

One case example adapted from: https://support.bioconductor.org/p/38420/

library(GO.db)

library(org.Hs.eg.db)

res <- get("GO:0006913", revmap(org.Hs.egGO))

res <- do.call('c', mget(res, org.Hs.egSYMBOL))

sort(unique(res))

[1] "AAAS" "ANKRD54" "ANP32A" "CAMK1" "CDK5" "CITED1" "EIF5A" "FBXO22" "MYBBP1A" "NEUROD1" "NPM1" "NSRP1" "NUP205" [14] "NUP98" "RGS14" "RSRC1" "SET" "UPF3A"

library(biomaRt)

ensembl <- useMart("ensembl", dataset = "hsapiens_gene_ensembl")

res <- getBM(c("hgnc_symbol"), filters = "go", values = "GO:0006913", mart = ensembl)

sort(unique(res[,1]))

[1] "AAAS" "ANKRD54" "ANP32A" "ANP32D" "ANP32E" "CAMK1" "CDK5" "CITED1" "EIF5A" "FBXO22" "MYBBP1A" "NEUROD1" "NPM1"
[14] "NSRP1" "NUP155" "NUP205" "NUP35" "NUP54" "NUP58" "NUP98" "RAN" "RGS14" "RSRC1" "SET" "UPF3A"

sessionInfo()

R version 3.4.1 (2017-06-30) Platform: x86_64-apple-darwin15.6.0 (64-bit) Running under: macOS Sierra 10.12.6

Matrix products: default BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib

locale: [1] C

attached base packages: [1] parallel stats4 stats graphics grDevices utils datasets methods base

other attached packages: [1] data.table_1.10.4-3 gap_1.1-20 biomaRt_2.32.1 GO.db_3.4.1 org.Hs.eg.db_3.4.1 AnnotationDbi_1.38.2 [7] IRanges_2.10.5 S4Vectors_0.14.7 Biobase_2.36.2 BiocGenerics_0.22.1 limma_3.32.10

loaded via a namespace (and not attached): [1] Rcpp_0.12.14 bit_1.1-12 rlang_0.1.4 blob_1.1.0 plyr_1.8.4 tools_3.4.1 DBI_0.7 bit64_0.9-7
[9] digest_0.6.12 tibble_1.3.4 bitops_1.0-6 RCurl_1.95-4.8 memoise_1.1.0 RSQLite_2.0 compiler_3.4.1 XML_3.98-1.9
[17] pkgconfig_2.0.1

go.db go biomart R • 1.3k views
ADD COMMENTlink modified 10 weeks ago by Dunja10 • written 11 months ago by lihaone10

https://www.bioconductor.org/help/course-materials/2011/BioC2011/LabStuff/AnnotationSlidesBioc2011.pdf

In my opinion, these slides show all the differences.

ADD REPLYlink written 11 months ago by natasha.sernova3.1k

Thanks Natasha for the slides. What I really wanted to know is which of these resources are the "correct" one to use in retrieving GO annotations (like GO ID to Entrez ID). I want to have you guys suggestions, so that I (and many other freshmen in the field) do not need to spend time on the selection of the resources.

ADD REPLYlink written 11 months ago by lihaone10

I think depending on their needs scientists find out what looks more reliable for their purposes. For example, out of many protein databases SwissProt looks mostly reliable - it is manually curated. Unfortunately it is small. See below what I found about GOA, Bioconductor and Biomart.

If you would like to know what is avalable in Biostars about all three ones, go to the upper left cormer of the page,

click LATEST and type GO Bioconductor Biomart into the empty line that appears in the middle. You will find more than 10

posts.

For example,

A: Tool for Human Gene Functional classes in R

A: From Ensembl Transcript Id To Go Term(S), Is There A Mapping?

What I found somewhere else is below.

Three articles above are about GO and GOA reliability.

https://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-6-S1-S17

http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0040519

http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1000431

Biomart

http://www.biomart.org/

“A large number of servers that provide access to a wide range of research data have been set up by the BioMart community. Using BioMart’s unique data federation technology, a Central Portal was established to provide a convenient single point of access to all of these data, which is distributed worldwide.”

http://www.biomart.org/other/biomart_0.9_0_documentation.pdf

Bioconductor

http://bioconductor.org/packages/release/data/annotation/html/org.Hs.eg.db.html

“Annotation. The Bioconductor project provides software for associating microarray and other genomic data in real time with biological metadata from web databases such as GenBank, Entrez genes and PubMed (annotate package). Functions are also provided for incorporating the results of statistical analysis in HTML reports with links to annotation web resources. Software tools are available for assembling and processing genomic annotation data, from databases such as GenBank, the Gene Ontology Consortium, Entrez genes, UniGene, the UCSC Human Genome Project (AnnotationDbi package). Annotation data packages are distributed to provide mappings between different probe identifiers (e.g. Affy IDs, Entrez genes, PubMed). Customized annotation libraries can also be assembled.”

ADD REPLYlink modified 11 months ago • written 11 months ago by natasha.sernova3.1k

Thanks Natasha for these links. I updated my question above.

ADD REPLYlink written 11 months ago by lihaone10
4
gravatar for Lluís R.
11 months ago by
Lluís R.780
Spain, Barcelona
Lluís R.780 wrote:

GO.db and org.Hs.eg.db are copies of the GO annotations. GO.db is updated every 6 months with each release of Bioconductor. org.Hs.eg.db is also updated at the same time and using GO.db. Following the release schedule of Bioconductor as explained in Annotation Packages:

Annotation packages contain lightly or non-curated data from a public source and are updated with each Bioconductor release (every 6 months).

biomaRt connects to the server where the informations is stored, so it will be the most up to date.

If you want a stable release you can use either GO.db or org.Hs.eg.db, if you want the most up to date (from yesterday) data every time you do an analysis you can use biomaRt.

ADD COMMENTlink modified 10 weeks ago • written 11 months ago by Lluís R.780

Thanks Lluís for these information. This is exactly what I wanted to know!

ADD REPLYlink written 11 months ago by lihaone10
0
gravatar for Dunja
10 weeks ago by
Dunja10
MPI of Psychiatry, Munich
Dunja10 wrote:

Hi Lluis,

Can you tell me where did you find information that org.Hs.eg.db updates every 6 months? Can you share a link or something?

Thanks! Dunja

ADD COMMENTlink written 10 weeks ago by Dunja10

Yes, it is the basic information of the updates of Bioconductor packages. You can find it explained in Annotation Packages:

Annotation packages contain lightly or non-curated data from a public source and are updated with each Bioconductor release (every 6 months).

I'll add it in the original answer

ADD REPLYlink modified 10 weeks ago • written 10 weeks ago by Lluís R.780

Hi Lluis,

Thank you for your reply. Yes, the package is updated biannually, but I have to mention that the KEGG database source this package uses is over 7 years old, according to their Bioconductor manual, p17: Mappings were based on data provided by: KEGG GENOME ftp://ftp.genome.jp/pub/kegg/genomes With a date stamp from the source of: 2011-Mar15 https://bioconductor.org/packages/release/data/annotation/manuals/org.Hs.eg.db/man/org.Hs.eg.db.pdf ; in case someone was not aware of this.

So if one wants to map genes to KEGG pathway IDs, one should use KEGGREST package or something else.

ADD REPLYlink written 7 weeks ago by Dunja10

Totally true, but that was not the question I was answering, it asked specifically for three packages.

And this is duly noted in the KEGG.db package and in the info of the dataitself. This is due to a change in the KEGG license. One can access to the whole KEGG database using the API though.

ADD REPLYlink written 7 weeks ago by Lluís R.780
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1585 users visited in the last hour