Question: Mapping mouse gene symbols to Entrez IDs in GAGE
0
gravatar for bojingjia
3.3 years ago by
bojingjia10
United States
bojingjia10 wrote:

I've come across many posts about common errors using GAGE, and many of these common pitfalls relate to mismatching ID systems (Entrez gene ID, gene symbol, etc). I've read the "Gene set and data preparation" vignette, but still get errors when I try to convert my gene symbols to Entrez IDs.

I have two questions:

1. Is there a way to map more "efficiently" gene symbols to Entrez IDs? For example, of 38720 unique input IDs, 8850 of my genes remain unmapped. I am using the mouse data set, trying to map gene symbols in my featureCounts output.

2. What does it really mean when I fail to download xml/png files for my GAGE analysis? I get errors like: 

Info: Downloading xml files for hsammu04060, 1/1 pathways..
Warning: Download of hsammu04060 xml file failed!
This pathway may not exist!

Thanks in advance.

rna-seq gsea pathview deseq2 gage • 2.6k views
ADD COMMENTlink modified 3.3 years ago by bigmawen310 • written 3.3 years ago by bojingjia10
## Load required libraries
library("DESeq2")
library("gage")
library("pathview")

## Combine count files into dataframe
# Import data from featureCounts
countdata <- read.table("wt_CEvsRT.txt", header=TRUE, row.names=1)

# Convert to matrix
countdata <- as.matrix(countdata)
head(countdata)

# Assign condition
sampleCondition <- c("RT", "RT", "RT", "CE", "CE", "CE")

# Analysis with DESeq2 ----------------------------------------------------
# Create a coldata frame and instantiate the DESeqDataSet. See ?DESeqDataSetFromMatrix
(coldata <- data.frame(row.names=colnames(countdata), sampleCondition))
dds <- DESeqDataSetFromMatrix(countData=countdata, colData=coldata, design=~sampleCondition)

## Run DESeq normalization
dds<-DESeq(dds)

##from GAGE

deseq2.res <- results(dds)
deseq2.fc=deseq2.res$log2FoldChange
names(deseq2.fc)=rownames(deseq2.res)
exp.fc=deseq2.fc
out.suffix="deseq2"

require(gage)
data(kegg.gs)

#get the annotation files for mouse

kg.mouse<- kegg.gsets("mouse")
kegg.gs<- kg.mouse$kg.sets[kg.mouse$sigmet.idx]

#convert gene symbol to entrez ID

gene.symbol.eg<- id2eg(ids=names(exp.fc), category='SYMBOL', org='Mm')

names(exp.fc)<- gene.symbol.eg[,2]

fc.kegg.p <- gage(exp.fc, gsets = kegg.gs, ref = NULL, samp = NULL)
sel <- fc.kegg.p$greater[, "q.val"] < 0.2 & !is.na(fc.kegg.p$greater[, "q.val"])
path.ids <- rownames(fc.kegg.p$greater)[sel]
sel.l <- fc.kegg.p$less[, "q.val"] < 0.2 & !is.na(fc.kegg.p$less[,"q.val"])
path.ids.l <- rownames(fc.kegg.p$less)[sel.l]
path.ids2 <- substr(c(path.ids, path.ids.l), 1, 8)
require(pathview)
#view first 3 pathways as demo
pv.out.list <- sapply(path.ids2[1:3], function(pid) pathview(gene.data = exp.fc, pathway.id = pid,species = "hsa", out.suffix=out.suffix))

 

 

 

ADD REPLYlink modified 3.3 years ago • written 3.3 years ago by bojingjia10
1

I don't know if it is the cause of all your problems, but you should be using species = "mmu" on your pathview() call.

ADD REPLYlink written 3.3 years ago by h.mon24k

Thanks! That solved the errors. I am still unable to completely map all the gene symbols, do you have any suggestions?

ADD REPLYlink written 3.3 years ago by bojingjia10

No, I do not have any (easy) suggestions. In fact, the situation is probably worst, if you use org.Mm.eg.db and do:

gene.symbol.eg <- select(org.Mm.eg.db,keys=names(exp.fc),columns="ENTREZID", keytype="SYMBOL")

you will probably find a "1:many mapping", indicating some gene names have multiple IDs. See here and here for discussions and suggestions.

 

ADD REPLYlink written 3.3 years ago by h.mon24k
0
gravatar for bigmawen
3.3 years ago by
bigmawen310
United States
bigmawen310 wrote:

id2eg use comprehensive gene annotation packages in Bioconductor. Almost all (if not all) official gene symbols can be mapped to Entrez Gene IDs this way. You should check that the unmapped gene symbols are “official”, as they might be synonyms or even other types of gene IDs, or transcript IDs. Having that said, there are ~30000 genes mapped in your data. Pathway analysis with that should still be very informative.

BTW, in for your error message, species = "mmu" is the solution. When species is not set, the default (hsa, i.e. human) will be used. Hence you get funny pathway names like hsammu04060, of couse, you are not able to download anything for these “pathways”.

ADD COMMENTlink written 3.3 years ago by bigmawen310
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1180 users visited in the last hour