Question: Need to convert gene IDs for GAGE pathway analysis
gravatar for Andrea.Wall
3.9 years ago by
Andrea.Wall10 wrote:

Hi everybody,

I'm trying to perform a pathway analysis with the R Bioconductor packages Gage/Pathview on honey bee RNA-seq data. I think I got to more or less write a script that would work, except it doesn't because my gene expression data uses gene IDs from BeeBase, while the Kegg package of the honey bee uses ncbi gene ids, so I would need to convert the gene names of my differential expression results. Here's the code I wrote so far:








lapply(kg.ame[1:3], head)


#Use the signaling pathways and metabolic pathways in the analysis. Extract those pathways:$kg.sets[kg.ame$sigmet.idx]

#now call gage with your data like:
fc.kegg.p <- gage(exp.fc, gsets =, ref = NULL, samp = NULL, gene.idtype="RefSeq" )

sel <- fc.kegg.p$greater[, "q.val"] < 1 &
  !$greater[, "q.val"])

path.ids <- rownames(fc.kegg.p$greater)[sel]

sel.l <- fc.kegg.p$less[, "q.val"] < 1 &

path.ids.l <- rownames(fc.kegg.p$less)[sel.l]

path.ids2 <- substr(c(path.ids, path.ids.l), 1, 8)[1:3], head, 3)






#view first 3 pathways as demo
pv.out.list <- sapply(path.ids2, function(pid) pathview( exp.fc,, 
  species="ame", out.suffix=outsuffix, gene.idtype="Kegg"))`

So I think the problem is that my kg.sets look like:

$kg.sets$`ame04512 ECM-receptor interaction`
 [1] "100576742" "406097"    "408393"    "408551"    "408552"    "408826"    "409448"   
 [8] "409474"    "409924"    "410038"    "410775"    "412663"    "412941"    "413739"   
[15] "413782"    "724950"    "726736"    "726871"

While my exp.fc looks like:

GB40001-RA GB40002-RA GB40003-RA GB40004-RA GB40005-RA GB40006-RA 
 0.0000000         NA -0.1594682         NA -0.1337054 -0.1054865

Looking around for related posts I've found this solution ( to create a mapping table for conversion:

"You can download the gene_info data file from NCBI ftp site: under unix/linux shell, do: gunzip All_Invertebrates.gene_info.gz egrep '(^7460)' All_Invertebrates.gene_info >>am.gene_info.txt

Column 2-6 are (Entrez) GeneID, Symbol, LocusTag, Synonyms, dbXrefs. You see GB numbers in Column 5 and 6. After get the ID mapping, you can use mol.sum in pathview package to merge redundant IDs into a unified expression level. ?pathview::mol.sum "

But my problem is that I don't know how to do the last bit of what is suggested here, getting the ID mapping in r and using mol.sum to merge the redundant IDs. Could someone check my script so far and help writing the missing lines of the script?

Thanks a lot in advance for your kind help.

Best, Joanito

rna-seq R • 1.7k views
ADD COMMENTlink modified 3.9 years ago • written 3.9 years ago by Andrea.Wall10
gravatar for EagleEye
3.9 years ago by
EagleEye6.6k wrote:

Did you try using GeneSCF (Supports gene ids and symbols).

Example for supported ids for ame, Apis mellifera (honey bee),

**Geneids**   **GeneSymbols**

552457  GMCOX6, GB16735

410745  GMCOX7, GB13150

410744  GB19663

726243  Flo2, GB15521

410743  GMCOX9, GB17446

410742  GMCOX10, GB16967

408609  GB10050

./geneSCF -m=update -i=INPUTgene.list -t=gid -db=KEGG -o=/ExistingOUTPUTfolder/ -org=ame --plot=yes --background=#TotalGenesFromAnnotation

ADD COMMENTlink modified 3.9 years ago • written 3.9 years ago by EagleEye6.6k
gravatar for Andrea.Wall
3.9 years ago by
Andrea.Wall10 wrote:

Thanks EagleEye I will give that a try. However, I would like to also do the Gge/Pathview analysis for several reasons. All that I'd need is to convert my gene IDs, and I guess it shouldn't be that difficult when I have a table with both ID types to make the conversion, but I'm not sure how that is done in R. Can someone please help with this? Thanks a lot!

ADD COMMENTlink written 3.9 years ago by Andrea.Wall10
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1072 users visited in the last hour