Question

Is there an R package that pulls up gene functional annotations with gene symbols as input?

7

Entering edit mode

9.4 years ago

karthik ▴ 90

I have a list of genes, each of which I would like to independently annotate with a function and/or pathway using "keywords" associated with that gene.

Is there an R package that returns functional keywords when the gene symbol (e.g. BRCA1 or IL2RA) is used as a query?

I am not looking for functional enrichments of the set of genes as a whole, but keywords for each gene independent of others.

This seems like a very simple thing that would be a commonplace task. But I don't see any packages in R that allow me to do that. Any help would be appreciated.

Karthik

gene RNA-Seq annotation • 17k views

ADD COMMENT • link updated 2.2 years ago by Ram 43k • written 9.4 years ago by karthik ▴ 90

Ram · Answer 1 · 2014-11-21

You can use the mygene.R package available via Bioconductor: http://www.bioconductor.org/packages/release/bioc/html/mygene.html

To install:

source("http://bioconductor.org/biocLite.R")
biocLite("mygene")`

Load: library(mygene)

1. Create at list of your gene symbols or entrez gene ids or whatever (various inputs are acceptable as long as they're properly scoped):

> xli <- c('BRCA1', 
       'BRCA2', 
       'SOX2', 
       'MYC')

2. Run the search for the items in your list (in this case, scoping to gene symbols, returning entrezgene id's and gene ontology and restricting to human genes) and display your search results:

> res <- queryMany(xli, scopes='symbol', fields=c('entrezgene', 'go'), species='human')
> res

Results:

DataFrame with 4 rows and 6 columns
     go.CC    go.MF    go.BP       query entrezgene         _id
    <List>   <List>   <List> <character>  <integer> <character>
1 ######## ######## ########       BRCA1        672         672
2 ######## ######## ########       BRCA2        675         675
3 ######## ######## ########        SOX2       6657        6657
4 ######## ######## ########         MYC       4609        4609

3. Display records of interest (in this case the cellular gene ontology terms for the 1st record, but you can also get the biological process go's and molecular function go's):

> res[1, 'go.CC'][[1]]

Results (again, just cellular component go's, change to 'CC' to 'BP' or 'MF' for other types of go's:

                         term   pubmed         id evidence
1    ubiquitin ligase complex 14976165 GO:0000151      NAS
2                     nucleus 17525340 GO:0005634      IDA
3                 nucleoplasm       NA GO:0005654      TAS
4                  chromosome       NA GO:0005694      ISS
5                   cytoplasm       NA GO:0005737      IDA
6             plasma membrane       NA GO:0005886      IDA
7  gamma-tubulin ring complex 12214252 GO:0008274      NAS
8   ribonucleoprotein complex 18809582 GO:0030529      IDA
9         BRCA1-BARD1 complex 12890688 GO:0031436      IDA
10            protein complex  9774970 GO:0043234      IDA
11            BRCA1-A complex 17525340 GO:0070531      IDA

Ram · Answer 2 · 2014-11-21

0

Entering edit mode

9.4 years ago

EagleEye 7.5k

You can also use this, if you are using human genes and running on Linux: Gene Set Clustering based on Functional annotation (GeneSCF)

ADD COMMENT • link updated 2.2 years ago by Ram 43k • written 9.4 years ago by EagleEye 7.5k

score 0 · Answer 3 · 2019-11-18

Even though I tend to use the web-interface, I believe you can accomplish what you want with the R-package for Enrichr:

https://cran.r-project.org/web/packages/enrichR/index.html

The only caveat is that you'll need to know which gene sets you want to test ahead of time (instead of browsing through them interactively)