Question: Mapping mouse gene symbols to Entrez IDs in GAGE
gravatar for bojingjia
4.9 years ago by
United States
bojingjia10 wrote:

I've come across many posts about common errors using GAGE, and many of these common pitfalls relate to mismatching ID systems (Entrez gene ID, gene symbol, etc). I've read the "Gene set and data preparation" vignette, but still get errors when I try to convert my gene symbols to Entrez IDs.

I have two questions:

1. Is there a way to map more "efficiently" gene symbols to Entrez IDs? For example, of 38720 unique input IDs, 8850 of my genes remain unmapped. I am using the mouse data set, trying to map gene symbols in my featureCounts output.

2. What does it really mean when I fail to download xml/png files for my GAGE analysis? I get errors like: 

Info: Downloading xml files for hsammu04060, 1/1 pathways..
Warning: Download of hsammu04060 xml file failed!
This pathway may not exist!

Thanks in advance.

rna-seq gsea pathview deseq2 gage • 3.4k views
ADD COMMENTlink modified 4.9 years ago by bigmawen340 • written 4.9 years ago by bojingjia10
## Load required libraries

## Combine count files into dataframe
# Import data from featureCounts
countdata <- read.table("wt_CEvsRT.txt", header=TRUE, row.names=1)

# Convert to matrix
countdata <- as.matrix(countdata)

# Assign condition
sampleCondition <- c("RT", "RT", "RT", "CE", "CE", "CE")

# Analysis with DESeq2 ----------------------------------------------------
# Create a coldata frame and instantiate the DESeqDataSet. See ?DESeqDataSetFromMatrix
(coldata <- data.frame(row.names=colnames(countdata), sampleCondition))
dds <- DESeqDataSetFromMatrix(countData=countdata, colData=coldata, design=~sampleCondition)

## Run DESeq normalization

##from GAGE

deseq2.res <- results(dds)


#get the annotation files for mouse

kg.mouse<- kegg.gsets("mouse")<- kg.mouse$kg.sets[kg.mouse$sigmet.idx]

#convert gene symbol to entrez ID<- id2eg(ids=names(exp.fc), category='SYMBOL', org='Mm')


fc.kegg.p <- gage(exp.fc, gsets =, ref = NULL, samp = NULL)
sel <- fc.kegg.p$greater[, "q.val"] < 0.2 & !$greater[, "q.val"])
path.ids <- rownames(fc.kegg.p$greater)[sel]
sel.l <- fc.kegg.p$less[, "q.val"] < 0.2 & !$less[,"q.val"])
path.ids.l <- rownames(fc.kegg.p$less)[sel.l]
path.ids2 <- substr(c(path.ids, path.ids.l), 1, 8)
#view first 3 pathways as demo
pv.out.list <- sapply(path.ids2[1:3], function(pid) pathview( = exp.fc, = pid,species = "hsa", out.suffix=out.suffix))
ADD REPLYlink modified 9 months ago by RamRS30k • written 4.9 years ago by bojingjia10

I don't know if it is the cause of all your problems, but you should be using species = "mmu" on your pathview() call.

ADD REPLYlink written 4.9 years ago by h.mon31k

Thanks! That solved the errors. I am still unable to completely map all the gene symbols, do you have any suggestions?

ADD REPLYlink written 4.9 years ago by bojingjia10

No, I do not have any (easy) suggestions. In fact, the situation is probably worst, if you use and do: <- select(,keys=names(exp.fc),columns="ENTREZID", keytype="SYMBOL")

you will probably find a "1:many mapping", indicating some gene names have multiple IDs. See here and here for discussions and suggestions.

ADD REPLYlink modified 9 months ago by RamRS30k • written 4.8 years ago by h.mon31k
gravatar for bigmawen
4.9 years ago by
United States
bigmawen340 wrote:

id2eg use comprehensive gene annotation packages in Bioconductor. Almost all (if not all) official gene symbols can be mapped to Entrez Gene IDs this way. You should check that the unmapped gene symbols are “official”, as they might be synonyms or even other types of gene IDs, or transcript IDs. Having that said, there are ~30000 genes mapped in your data. Pathway analysis with that should still be very informative.

BTW, in for your error message, species = "mmu" is the solution. When species is not set, the default (hsa, i.e. human) will be used. Hence you get funny pathway names like hsammu04060, of couse, you are not able to download anything for these “pathways”.

ADD COMMENTlink written 4.9 years ago by bigmawen340
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 797 users visited in the last hour