Question

GAGE - eg2sym (ID conversion)

0

Entering edit mode

7.7 years ago

mbk0asis ▴ 680

Hello.

I'm trying to run the KEGG pathway analysis using "GAGE" and having trouble of running it because ID conversion isn't working for me.

First of all, I'm dealing with "cow" RNA-seq data.

Here's the R code I used...

kg.bta=kegg.gsets("bta")
kegg.gs=kg.bta$kg.sets[kg.bta$sigmet.idx]
kegg.gs.sym <- lapplykegg.gs,eg2sym)
lapply(kegg.gs.sym[1:3],head)

and the results are...

$`bta00970 Aminoacyl-tRNA biosynthesis`
[1] NA NA NA NA NA NA

$`bta02010 ABC transporters`
[1] NA NA NA NA NA NA

$`bta03008 Ribosome biogenesis in eukaryotes`
[1] NA NA NA NA NA NA

I guess the "eg2sym" only works for "human".

I know I could also convert IDs on my RNA-seq data to Entrez, but I experienced a loss of about 5000 genes after conversion. And I'm worrying the loss of genes might distort the results.

So, I wonder if I can use "eg2sym" on cow data, and if not, how I can do it.

Thank you!

GAGE RNA-Seq GSEA KEGG • 3.1k views

ADD COMMENT • link updated 7.6 years ago by Guangchuang Yu ★ 2.6k • written 7.7 years ago by mbk0asis ▴ 680

0

Entering edit mode

7.7 years ago

EagleEye 7.5k

Have you tried GeneSCF.

Here is the example for retrieving sheep and cow KEGG as simple text file,

A: Gene ontology in sheep

For enrichment analysis use,

./geneSCF -m=update -i=INPUTgene.list -t=gid -db=KEGG -o=/ExistingOUTPUTfolder/ -org=bta --plot=yes --background=#NumberOfBackgroundGenes

For complete information check,

Gene Set Clustering based on Functional annotation (GeneSCF)

ADD COMMENT • link 7.7 years ago by EagleEye 7.5k

score 3 · Accepted Answer · 2016-09-07

You used eg2sym function, which only works for human data. Pathview package provides a set of more general gene ID conversion functions, i.e. eg2id, id2eg, geneannot.map etc. these functions work for 19 major research species. For more details:

library(pathview)
?eg2id

The following code would work for you.

kegg.gs.sym <- lapplykegg.gs,function(x){
syms=eg2id(x, org="Bt", category =”symbol”)
return(syms[,2])
})

Result shown:

> lapply(kegg.gs.sym[1:3],head)
$`bta00970 Aminoacyl-tRNA biosynthesis`
[1] "EARS2" "VARS2" "SARS"  "WARS"  "YARS"  "SARS2"

$`bta02010 ABC transporters`
[1] "LOC100296627" "ABCA2"        "ABCC9"        "ABCB4"        "ABCC6"       
[6] "LOC101909228"

$`bta03008 Ribosome biogenesis in eukaryotes`
[1] "RMRP"    "SPATA5"  "RN28S1"  "RN5-8S1" "AK6"     NA

score 2 · Accepted Answer · 2016-09-07

bitr in clusterProfiler is another choice for you.

> require(org.Bt.eg.db)
> sample_eg = sample(keys(org.Bt.eg.db), 100)
> head(sample_eg)
[1] "104971204" "107131358" "107131847" "788587"    "789592"    "519105"
> require(clusterProfiler)
> eg2sym = bitr(sample_eg, fromType='ENTREZID', toType="SYMBOL", OrgDb=org.Bt.eg.db)
> head(eg2sym)
   ENTREZID       SYMBOL
1 104971204 LOC104971204
2 107131358 LOC107131358
3 107131847 LOC107131847
4    788587    LOC788587
5    789592    LOC789592
6    519105    LOC519105
> tail(eg2sym)
     ENTREZID       SYMBOL
95  104977419       CALEST
96  104974952 LOC104974952
97  107080383       INSINT
98  100336249 LOC100336249
99     504224  C15H11orf16
100    529919    LOC529919