gseGO() --> No gene can be mapped
1
0
Entering edit mode
8 months ago
juliette • 0

Dear all,

I have a list of S. cerevisiae genes and I want to do GO enrichment analysis using clusterProfiler. I already obtained some information using enrichGO() and groupGO(), and I want to see what I can obtain with gseGO().

I use the package org.Sc.sgd.db as my organism database. Here is my script:

library(clusterProfiler)
library(DOSE)
library(org.Sc.sgd.db)

gene_list <- "~/pathway/to/my/list.xslx"

gene_ids <- mapIds(org.Sc.sgd.db, keys = genes$GeneID, column = "ENTREZID", keytype = "COMMON")
geneList <- gene_ids[order(gene_ids, decreasing = T)]
data(geneList, package = "DOSE")

gseGO(geneList= geneList, ont = "BP", OrgDb = org.Sc.sgd.db, keyType = "ENTREZID")

And I obtain this error message:

preparing geneSet collections...
--> Expected input gene ID: 855471,855490,850303,855645,851691,854818
Error in check_gene_id(geneList, geneSets) : 
  --> No gene can be mapped....

I tried to convert the ENTREZID labels in numeric, but it didn't change the results. I tried also with GENENAME instead of ENTREZID, but same. Since I have some NA values, I also tried to remove them, but again, nothing changed.

From the error message, I understand that the command doesn't recognise my IDs as the right format, but for me they are.

Here is an example of my vector in numeric and without the NA values:

gene_ids
  [1] 851236 851289 852194 852218 852292 852305 852318 852366 852410 852445 852447 852567 850312
 [14] 850377 850398 851359 851404 851430 851667 851697 851698 851713 851727 851746 851762 851781
 [27] 851788 851857 851911 851917 851918 852030 852077 852106 852123 856640 856711 856830 856862
 [40] 850559 850608 852637 852667 852703 852803 852905 852987 853083 853116 853145 856339 856364

If anyone has some advice of what I could try, it would be really useful for my work.

Thank you, Juliette

R clusterProfiler GO Ontology • 1.1k views
ADD COMMENT
0
Entering edit mode

To my knowledge geneList from DOSE package contains human genes and cannot be applied to S.cerevisiae GO. This is why you get the error. Create a geneList on the same format with your genes input

ADD REPLY
0
Entering edit mode

Yes, indeed, this was a stupid mistake from me. However, I removed the line : data(geneList, package = "DOSE")

I added more detail in another comment! Thank you for your help !

ADD REPLY
1
Entering edit mode

geneList should be a named vector (look at example from DOSE package) :

> data(geneList, package = "DOSE")
> head(geneList)
    4312     8318    10874    55143    55388      991 
4.572613 4.514594 4.418218 4.144075 3.876258 3.677857 

The names of the vector you provided should be the ENTREZID of your genes. In your code provided below, it is the inverse, ENTREZID are the values while gene symbol are the names.

names(gene_ids) = gene_ids should solve the problem (considering you list is already ranked), but pay attention that your list is ranked adequately for GSEA

ADD REPLY
0
Entering edit mode

I didn't know the values should be named, thank you a lot!! Now I have others error messages but I will try to fix them by myself. I still have a lot to learn about this package.

ADD REPLY
0
Entering edit mode
8 months ago
gene_list <- "~/pathway/to/my/list.xslx"

gene_ids <- mapIds(org.Sc.sgd.db, keys = genes$GeneID, column = "ENTREZID", keytype = "COMMON")
geneList <- gene_ids[order(gene_ids, decreasing = T)]
data(geneList, package = "DOSE")

gseGO(geneList= geneList, ont = "BP", OrgDb = org.Sc.sgd.db, keyType = "ENTREZID")

First off, you go through the effort of creating your own gene list, but then load up the one provided by DOSE, which will overwrite your geneList object. Second, that example geneList contains human Entrez IDs, so trying to map it to S. cerevisiae isn't going to return anything.

We can't see your input data to see if it's correct, but remove the data(geneList, package = "DOSE") line and try again.

ADD COMMENT
0
Entering edit mode

Thank you for taking the time to help me, you're right with the DOSE, I was panicking and trying anything to get a result. I will add more information :

> gene_list <- paste("/Users/graffj/Desktop/Results-Supp-2/supp2/gene_id_", ref_sample, ".xlsx", sep = "")
> genes <- read_xlsx(gene_list)
New names:                                                                                     
• `` -> `...1`
> genes <- as.data.frame(genes[, 2:3])
> head(genes$GeneID)
[1] "FLO9"   "FLO1"   "CDC27"  "SKT5"   "DSF2"   "KAP104"

I removed the line, still I have the error :

> gene_ids <- mapIds(org.Sc.sgd.db, keys = genes$GeneID, column = "ENTREZID", keytype = "COMMON")
'select()' returned 1:1 mapping between keys and columns
> head(gene_ids)
FLO9     FLO1    CDC27     SKT5     DSF2   KAP104 
"851236" "851289" "852194" "852218" "852292" "852305"

> gene_ids <- as.numeric(gene_ids)
> head(gene_ids)
[1] 851236 851289 852194 852218 852292 852305

> gene_ids <- gene_ids[-which(is.na(gene_ids))]
> geneList <- gene_ids[order(gene_ids, decreasing = T)]
> head(geneList)
[1] 856862 856830 856711 856640 856459 856429

> gseGO(geneList= geneList, ont = "BP", OrgDb = org.Sc.sgd.db, keyType = "ENTREZID")
preparing geneSet collections...
--> Expected input gene ID: 855929,851040,852641,852106,850451,854504
Error in check_gene_id(geneList, geneSets) : 
  --> No gene can be mapped....

I tried to give as many precision as I could, if something is missing, please let me know. Thank you again to help me.

ADD REPLY

Login before adding your answer.

Traffic: 1829 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6