Question: Pathway analysis gene name issue
0
gravatar for dazhudou1122
15 months ago by
dazhudou112240
dazhudou112240 wrote:

Dear BioStars Community,

I am a beginner of pathways analysis, and I ran into some issue using gage and pathview.

I mapped the RNA-seq (mouse cells) using STAR and counted the feature using featureCounts:

featureCounts -T 16 -t exon -g gene_id -a /qbrc/biodata/igenome/Mus_musculus/UCSC/mm10/Annotation/Genes/genes.gtf *.sam

So the row names are gene_ids. I then ran DESeq2, gage and pathview, but I encountered the following error:

Info: Downloading xml files for mmuNA, 1/1 pathways..

Warning: Download of mmuNA xml file failed!
This pathway may not exist!
Info: Downloading png files for mmuNA, 1/1 pathways..

Warning: Download of mmuNA png file failed!
This pathway may not exist!
Warning: Failed to download KEGG xml/png files, mmuNA skipped!
Warning messages:
1: In download.file(xml.url, xml.target, quiet = T) :
  URL 'http://rest.kegg.jp/get/mmuNA/kgml': status was '400 Bad Request'
2: In download.file(xml.url, xml.target, quiet = T) :
  URL 'http://rest.kegg.jp/get/mmuNA/kgml': status was '400 Bad Request'
3: In download.file(xml.url, xml.target, quiet = T) :
  URL 'http://rest.kegg.jp/get/mmuNA/kgml': status was '400 Bad Request'

I thought maybe my gene_id was the issue, so I tried conversion:

 id.map.refseq <- id2eg(ids = rownames(dds), category ="SYMBOL", org = "mmu")

But that did not work well:

'select()' returned 1:1 mapping between keys and columns
[1] "Note: 24046 of 24062 unique input IDs unmapped."

Can anyone help me solve the issue? Any input will be appreciated!

Best,

Wenhan

The codes I ran:

## Run DESeq normalization
countdata <- read.table("B6_vs_PPARA_count.txt", header=TRUE, row.names=1)
countdata <- countdata[ ,6:ncol(countdata)]
colnames(countdata) <- gsub("\\.[sb]am$", "", colnames(countdata))
countdata <- as.matrix(countdata)
head(countdata)
(condition <- factor(c(rep("ctl", 3), rep("exp", 3))))
library(DESeq2)
(coldata <- data.frame(row.names=colnames(countdata), condition))
dds <- DESeqDataSetFromMatrix(countData=countdata, colData=coldata, design=~condition)
dds
dds <- DESeq(dds)

##from GAGE

deseq2.res <- results(dds)
deseq2.fc=deseq2.res$log2FoldChange
names(deseq2.fc)=rownames(deseq2.res)
exp.fc=deseq2.fc
out.suffix="deseq2"

require(gage)
datakegg.gs)

#get the annotation files for mouse

kg.mouse<- kegg.gsets("mouse")
kegg.gs<- kg.mouse$kg.sets[kg.mouse$sigmet.idx]

#convert gene symbol to entrez ID

gene.symbol.eg<- id2eg(ids=names(exp.fc), category='SYMBOL', org='Mm')

names(exp.fc)<- gene.symbol.eg[,2]

fc.kegg.p <- gage(exp.fc, gsets = kegg.gs, ref = NULL, samp = NULL)
sel <- fc.kegg.p$greater[, "q.val"] < 0.2 & !is.na(fc.kegg.p$greater[, "q.val"])
path.ids <- rownames(fc.kegg.p$greater)[sel]
sel.l <- fc.kegg.p$less[, "q.val"] < 0.2 & !is.na(fc.kegg.p$less[,"q.val"])
path.ids.l <- rownames(fc.kegg.p$less)[sel.l]
path.ids2 <- substr(c(path.ids, path.ids.l), 1, 8)
require(pathview)
#view first 3 pathways as demo
pv.out.list <- sapply(path.ids2[1:3], function(pid) pathview(gene.data = exp.fc, pathway.id = pid,species = "mmu", out.suffix=out.suffix))
ADD COMMENTlink modified 15 months ago • written 15 months ago by dazhudou112240
2

Have you read carefully the messages ? Do you know what mmuNA is ? Then if you think the gene IDs are the issue, why don't you show us what they are ? Could it be they are not what you think they are or what the software expects ?

ADD REPLYlink written 15 months ago by Jean-Karim Heriche17k

I tried but I cant seem to find what mmuNA is....

Here are some examples of the GeneID I have: Geneid Xkr4 Rp1 Sox17 Mrpl15 Lypla1 Tcea1 Rgs20 Atp6v1h Oprk1 Npbwr1 Rb1cc1 Fam150a

ADD REPLYlink written 15 months ago by dazhudou112240
1

mmuNA looks suspiciously like the concatenation of mmu with NA, the symbol used for undefined/missing values in R. Investigating this might put you on the right track.

ADD REPLYlink written 15 months ago by Jean-Karim Heriche17k

Please use the Code button to make your post more readable.

101010 Button

ADD REPLYlink modified 15 months ago • written 15 months ago by h.mon21k

Thank you! I have modified accordingly:)

ADD REPLYlink written 15 months ago by dazhudou112240

could you paste head (rownames(dds),10) here?

ADD REPLYlink written 15 months ago by cpad011210k

Here it is:

[1] "Xkr4" "Rp1" "Sox17" "Mrpl15" "Lypla1" "Tcea1" "Rgs20" "Atp6v1h" "Oprk1" "Npbwr1"

Thank you:)

ADD REPLYlink written 15 months ago by dazhudou112240

can you try this and let us know:

id2eg(ids = toupper(rownames(dds)), category ="SYMBOL", org="mmu")

Let us know the ouput differences between:

id.map.refseq <- id2eg(ids = rownames(dds), category ="SYMBOL", org = "mmu")

and

id.map.refseq <- id2eg(ids = toupper(rownames(dds)), category ="SYMBOL", org="mmu")
ADD REPLYlink modified 15 months ago • written 15 months ago by cpad011210k

Dear cpad0112:

Thank you so much for helping!

Here are the results:

id2eg(ids = toupper(rownames(dds)), category ="SYMBOL", org="mmu")
[1] "Note: 8779 of 24062 unique input IDs unmapped."
 [ reached getOption("max.print") -- omitted 23562 rows ]

id.map.refseq <- id2eg(ids = rownames(dds), category ="SYMBOL", org = "mmu")
[1] "Note: 24046 of 24062 unique input IDs unmapped."

id.map.refseq <- id2eg(ids = toupper(rownames(dds)), category ="SYMBOL", org="mmu")
[1] "Note: 8779 of 24062 unique input IDs unmapped."

However, I still got the same error....

ADD REPLYlink written 15 months ago by dazhudou112240

It is weird that path.ids, path.ids.l and path.ids2 are all empty. This data set has more than 100 gens that are very significantly differently expressed...

ADD REPLYlink written 15 months ago by dazhudou112240
1

after running this?

id.map.refseq <- id2eg(ids = toupper(rownames(dds)), category ="SYMBOL", org="mmu")

[1] "Note: 8779 of 24062 unique input IDs unmapped."

earlier, ids were not getting mapped and with toupper, ~8800 ids got mapped. I guess you need to run the script with subset of your data rather than entire data and fix the code. I think you are following tutorial here: https://www.r-bloggers.com/tutorial-rna-seq-differential-expression-pathway-analysis-with-sailfish-deseq2-gage-and-pathview/. Try to emulate the same without much code changes to mouse. Example code works for humans and with some example data. `

ADD REPLYlink modified 15 months ago • written 15 months ago by cpad011210k

Why is your organism set to mmu in one place but Mm in another?

ADD REPLYlink written 15 months ago by igor7.1k

I tried, I used mmu, but no database was downloaded. And if you use mmu as species, then the program does not recognize...

ADD REPLYlink written 15 months ago by dazhudou112240
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1299 users visited in the last hour