GAGE - eg2sym (ID conversion)
3
0
Entering edit mode
4.7 years ago
mbk0asis ▴ 620

Hello.

I'm trying to run the KEGG pathway analysis using "GAGE" and having trouble of running it because ID conversion isn't working for me.

First of all, I'm dealing with "cow" RNA-seq data.

Here's the R code I used...

kg.bta=kegg.gsets("bta")
kegg.gs=kg.bta$kg.sets[kg.bta$sigmet.idx]
kegg.gs.sym <- lapplykegg.gs,eg2sym)
lapply(kegg.gs.sym[1:3],head)

and the results are...

$`bta00970 Aminoacyl-tRNA biosynthesis`
[1] NA NA NA NA NA NA

$`bta02010 ABC transporters`
[1] NA NA NA NA NA NA

$`bta03008 Ribosome biogenesis in eukaryotes`
[1] NA NA NA NA NA NA

I guess the "eg2sym" only works for "human".

I know I could also convert IDs on my RNA-seq data to Entrez, but I experienced a loss of about 5000 genes after conversion. And I'm worrying the loss of genes might distort the results.

So, I wonder if I can use "eg2sym" on cow data, and if not, how I can do it.

Thank you!

GAGE RNA-Seq GSEA KEGG • 2.1k views
ADD COMMENT
3
Entering edit mode
4.7 years ago
bigmawen ▴ 400

You used eg2sym function, which only works for human data. Pathview package provides a set of more general gene ID conversion functions, i.e. eg2id, id2eg, geneannot.map etc. these functions work for 19 major research species. For more details:

library(pathview)
?eg2id

The following code would work for you.

kegg.gs.sym <- lapplykegg.gs,function(x){
syms=eg2id(x, org="Bt", category =”symbol”)
return(syms[,2])
})

Result shown:

> lapply(kegg.gs.sym[1:3],head)
$`bta00970 Aminoacyl-tRNA biosynthesis`
[1] "EARS2" "VARS2" "SARS"  "WARS"  "YARS"  "SARS2"

$`bta02010 ABC transporters`
[1] "LOC100296627" "ABCA2"        "ABCC9"        "ABCB4"        "ABCC6"       
[6] "LOC101909228"

$`bta03008 Ribosome biogenesis in eukaryotes`
[1] "RMRP"    "SPATA5"  "RN28S1"  "RN5-8S1" "AK6"     NA
ADD COMMENT
0
Entering edit mode

Thank you! It works perfectly.

ADD REPLY
2
Entering edit mode
4.7 years ago
Guangchuang Yu ★ 2.4k

bitr in clusterProfiler is another choice for you.

> require(org.Bt.eg.db)
> sample_eg = sample(keys(org.Bt.eg.db), 100)
> head(sample_eg)
[1] "104971204" "107131358" "107131847" "788587"    "789592"    "519105"
> require(clusterProfiler)
> eg2sym = bitr(sample_eg, fromType='ENTREZID', toType="SYMBOL", OrgDb=org.Bt.eg.db)
> head(eg2sym)
   ENTREZID       SYMBOL
1 104971204 LOC104971204
2 107131358 LOC107131358
3 107131847 LOC107131847
4    788587    LOC788587
5    789592    LOC789592
6    519105    LOC519105
> tail(eg2sym)
     ENTREZID       SYMBOL
95  104977419       CALEST
96  104974952 LOC104974952
97  107080383       INSINT
98  100336249 LOC100336249
99     504224  C15H11orf16
100    529919    LOC529919
ADD COMMENT
0
Entering edit mode
4.7 years ago
EagleEye 6.9k

Have you tried GeneSCF.

Here is the example for retrieving sheep and cow KEGG as simple text file,

A: Gene ontology in sheep

For enrichment analysis use,

./geneSCF -m=update -i=INPUTgene.list -t=gid -db=KEGG -o=/ExistingOUTPUTfolder/ -org=bta --plot=yes --background=#NumberOfBackgroundGenes

For complete information check,

Gene Set Clustering based on Functional annotation (GeneSCF)

ADD COMMENT

Login before adding your answer.

Traffic: 2102 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6