Question: Is there an R package that pulls up gene functional annotations with gene symbols as input?
4
gravatar for karthik
4.5 years ago by
karthik60
United States
karthik60 wrote:

I have a list of genes, each of which I would like to independently annotate with a function and/or pathway using "keywords" associated with that gene.

Is there an R package that returns functional keywords when the gene symbol (e.g. BRCA1 or IL2RA) is used as a query? 

I am not looking for functional enrichments of the set of genes as a whole, but keywords for each gene independent of others.

This seems like a very simple thing that would be a commonplace task. But I don't see any packages in R that allow me to do that. Any help would be appreciated.


Karthik 

 

 

rna-seq annotation gene • 10k views
ADD COMMENTlink modified 4.5 years ago by EagleEye6.3k • written 4.5 years ago by karthik60
6
gravatar for gtsueng
4.5 years ago by
gtsueng150
United States
gtsueng150 wrote:

You can use the mygene.R package available via Bioconductor: http://www.bioconductor.org/packages/release/bioc/html/mygene.html

To install:
source("http://bioconductor.org/biocLite.R")
biocLite("mygene")


Load:
library(mygene)

 

1. Create at list of your gene symbols or entrez gene ids or whatever (various inputs are acceptable as long as they're properly scoped):

> xli <- c('BRCA1', 
       'BRCA2', 
       'SOX2', 
       'MYC')

 

2. Run the search for the items in your list (in this case, scoping to gene symbols, returning entrezgene id's and gene ontology and restricting to human genes) and display your search results:

> res <- queryMany(xli, scopes='symbol', fields=c('entrezgene', 'go'), species='human')

> res

Results:

DataFrame with 4 rows and 6 columns
     go.CC    go.MF    go.BP       query entrezgene         _id
    <List>   <List>   <List> <character>  <integer> <character>
1 ######## ######## ########       BRCA1        672         672
2 ######## ######## ########       BRCA2        675         675
3 ######## ######## ########        SOX2       6657        6657
4 ######## ######## ########         MYC       4609        4609

 

3. Display records of interest (in this case the cellular gene ontology terms for the 1st record, but you can also get the biological process go's and molecular function go's):

> res[1, 'go.CC'][[1]]

Results (again, just cellular component go's, change to 'CC' to 'BP' or 'MF' for other types of go's:

                         term   pubmed         id evidence
1    ubiquitin ligase complex 14976165 GO:0000151      NAS
2                     nucleus 17525340 GO:0005634      IDA
3                 nucleoplasm       NA GO:0005654      TAS
4                  chromosome       NA GO:0005694      ISS
5                   cytoplasm       NA GO:0005737      IDA
6             plasma membrane       NA GO:0005886      IDA
7  gamma-tubulin ring complex 12214252 GO:0008274      NAS
8   ribonucleoprotein complex 18809582 GO:0030529      IDA
9         BRCA1-BARD1 complex 12890688 GO:0031436      IDA
10            protein complex  9774970 GO:0043234      IDA
11            BRCA1-A complex 17525340 GO:0070531      IDA
ADD COMMENTlink modified 4.5 years ago • written 4.5 years ago by gtsueng150

@gtsueng... I have a similar question and i have already got the GO ids for my genes but now how do i extract some information from the GO ids for each gene? for example i want to extract JAK-STAT cascade or cellular protein metabolic process, how do i extract that?

ADD REPLYlink modified 2.7 years ago • written 2.7 years ago by saamar.rajput20
0
gravatar for EagleEye
4.5 years ago by
EagleEye6.3k
Sweden
EagleEye6.3k wrote:

You can also use this, if you are using human genes and running on Linux: Gene Set Clustering based on Functional annotation (GeneSCF)

ADD COMMENTlink written 4.5 years ago by EagleEye6.3k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 797 users visited in the last hour