Question

Need to convert gene IDs for GAGE pathway analysis

0

Entering edit mode

7.7 years ago

Andrea.Wall ▴ 10

Hi everybody,

I'm trying to perform a pathway analysis with the R Bioconductor packages Gage/Pathview on honey bee RNA-seq data. I think I got to more or less write a script that would work, except it doesn't because my gene expression data uses gene IDs from BeeBase, while the Kegg package of the honey bee uses ncbi gene ids, so I would need to convert the gene names of my differential expression results. Here's the code I wrote so far:

`res_ut_ins.fc=res_ut_ins$log2FoldChange

names(res_ut_ins.fc)=rownames(res_ut_ins)

exp.fc=res_ut_ins.fc

out.suffix="deseq2"


require(gage)

datakegg.gs)

kg.ame=kegg.gsets("ame")

names(kg.ame)

lapply(kg.ame[1:3], head)

lapply(exp.fc[1:3],head)

#Use the signaling pathways and metabolic pathways in the analysis. Extract those pathways:
kegg.gs=kg.ame$kg.sets[kg.ame$sigmet.idx]

#now call gage with your data like:
fc.kegg.p <- gage(exp.fc, gsets = kegg.gs, ref = NULL, samp = NULL, gene.idtype="RefSeq" )

sel <- fc.kegg.p$greater[, "q.val"] < 1 &
  !is.na(fc.kegg.p$greater[, "q.val"])

path.ids <- rownames(fc.kegg.p$greater)[sel]

sel.l <- fc.kegg.p$less[, "q.val"] < 1 &
  !is.na(fc.kegg.p$less[,"q.val"])

path.ids.l <- rownames(fc.kegg.p$less)[sel.l]

path.ids2 <- substr(c(path.ids, path.ids.l), 1, 8)

lapplykegg.gs[1:3], head, 3)

lengthkegg.gs)

head(exp.fc)

str(exp.fc)

head(path.ids)

head(path.ids2)

head(fc.kegg.p$greater)

require(pathview)
#view first 3 pathways as demo
pv.out.list <- sapply(path.ids2, function(pid) pathview(
  gene.data= exp.fc, pathway.id=pid, 
  species="ame", out.suffix=outsuffix, gene.idtype="Kegg"))`

So I think the problem is that my kg.sets look like:

$kg.sets$`ame04512 ECM-receptor interaction`
 [1] "100576742" "406097"    "408393"    "408551"    "408552"    "408826"    "409448"   
 [8] "409474"    "409924"    "410038"    "410775"    "412663"    "412941"    "413739"   
[15] "413782"    "724950"    "726736"    "726871"

While my exp.fc looks like:

GB40001-RA GB40002-RA GB40003-RA GB40004-RA GB40005-RA GB40006-RA 
 0.0000000         NA -0.1594682         NA -0.1337054 -0.1054865

Looking around for related posts I've found this solution (http://seqanswers.com/forums/archive/index.php/t-35472.html) to create a mapping table for conversion:

"You can download the gene_info data file from NCBI ftp site: ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/GENE_INFO/Invertebrates/All_Invertebrates.gene_info.gz under unix/linux shell, do: gunzip All_Invertebrates.gene_info.gz egrep '(^7460)' All_Invertebrates.gene_info >>am.gene_info.txt

Column 2-6 are (Entrez) GeneID, Symbol, LocusTag, Synonyms, dbXrefs. You see GB numbers in Column 5 and 6. After get the ID mapping, you can use mol.sum in pathview package to merge redundant IDs into a unified expression level. ?pathview::mol.sum "

But my problem is that I don't know how to do the last bit of what is suggested here, getting the ID mapping in r and using mol.sum to merge the redundant IDs. Could someone check my script so far and help writing the missing lines of the script?

Thanks a lot in advance for your kind help.

Best, Joanito

RNA-Seq R • 2.8k views

ADD COMMENT • link 7.7 years ago by Andrea.Wall ▴ 10

score 0 · Answer 1 · 2016-08-15

Did you try using GeneSCF (Supports gene ids and symbols).

Example for supported ids for ame, Apis mellifera (honey bee),

**Geneids**   **GeneSymbols**

552457  GMCOX6, GB16735

410745  GMCOX7, GB13150

410744  GB19663

726243  Flo2, GB15521

410743  GMCOX9, GB17446

410742  GMCOX10, GB16967

408609  GB10050

./geneSCF -m=update -i=INPUTgene.list -t=gid -db=KEGG -o=/ExistingOUTPUTfolder/ -org=ame --plot=yes --background=#TotalGenesFromAnnotation

score 0 · Answer 2 · 2016-08-16

0

Entering edit mode

7.7 years ago

Andrea.Wall ▴ 10

Thanks EagleEye I will give that a try. However, I would like to also do the Gge/Pathview analysis for several reasons. All that I'd need is to convert my gene IDs, and I guess it shouldn't be that difficult when I have a table with both ID types to make the conversion, but I'm not sure how that is done in R. Can someone please help with this? Thanks a lot!

ADD COMMENT • link 7.7 years ago by Andrea.Wall ▴ 10

0

Entering edit mode

Hi Andrea Wall,

I am facing the same problem as yours regarding converting gene names to Entrez gene IDs. If you had found the solution, would you please share it here? Thanks in advance, Arumoy.

ADD REPLY • link 3.5 years ago by chatterjee.arumoy ▴ 50