gseKEGG with streptomyces coelicolor - No gene can be mapped
0
1
Entering edit mode
13 months ago
r.evans2 ▴ 10

I am trying to use the gseKEGG function in the R package clusterProfiler with the KEGG database entries for Streptomyces coelicolor.

This code snippet confirms that KEGG supports this organism, with kegg_code of 'sco'.

search_kegg_organism('sco', by='kegg_code')

returns

kegg_code scientific_name common_name

2810 scon Streptococcus constellatus subsp. pharyngis C232 <NA>

2811 scos Streptococcus constellatus subsp. pharyngis C818 <NA>

3495 sco Streptomyces coelicolor <NA>

Examining the database T00085 at https://www.genome.jp/dbget-bin/get_linkdb?-t+genes+gn:T00085 seems to confirm that the format for the gene ids in the KEGG database is the widely used SCOnnnn, which happily is the format I use in my datasets - first few lines from the database replicated below :

sco:SCO0001 no KO assigned | (RefSeq) SCEND.02c; hypothetical protein

sco:SCO0002 no KO assigned | (RefSeq) SC8E7.42c, SCEND.01c, SCJ24.01c; hypothetical protein

sco:SCO0003 no KO assigned | (RefSeq) SC8E7.41c; DNA-binding protein

sco:SCO0004 no KO assigned | (RefSeq) SC1C9.01, SC8E7.40c; hypothetical protein

sco:SCO0005 no KO assigned | (RefSeq) SC1C9.02; transposase

sco:SCO0006 no KO assigned | (RefSeq) SC1C9.03, SCJ30.01; ATP/GTP-binding protein

sco:SCO0007 no KO assigned | (RefSeq) SCJ30.02c; hypothetical protein

sco:SCO0008 no KO assigned | (RefSeq) SCJ30.03c; hypothetical protein

sco:SCO0009 no KO assigned | (RefSeq) SCJ30.04c; hypothetical protein

sco:SCO0010 no KO assigned | (RefSeq) SCJ30.05; hypothetical protein

sco:SCO0011 no KO assigned | (RefSeq) SCJ30.06c; hypothetical protein

sco:SCO0012 no KO assigned | (RefSeq) SCJ30.07c; hypothetical protein

sco:SCO0013 no KO assigned | (RefSeq) SCJ30.09c; hypothetical protein

sco:SCO0014 no KO assigned | (RefSeq) SCJ30.10c; hypothetical protein

sco:SCO0015 K03313 Na+:H+ antiporter, NhaA family | (RefSeq) SCJ30.11c; Na+/H+ antiporter

But this code (example to demonstrate the problem, my real geneList has thousands of genes) does not work.

geneList=c(0.5,0.1,1)

names(geneList) = c('SCO0015','SCO0033','SCO0039')

geneList = sort(geneList, decreasing = TRUE)

kk2 <- gseKEGG(geneList = geneList, organism = 'sco',
minGSSize = 1, pvalueCutoff = 1, verbose = FALSE)

It generates the error

--> Expected input gene ID: Error in check_gene_id(geneList, geneSets) : --> No gene can be mapped....

which to me suggests that it cannot find the 3 genes in the geneList in the database (I get the same error with a geneList of 7000+ genes, and the three in the example are chosen as I know they have a Knnnnn mapped to them in the KEGG database, eg SCO0015 maps to K03313 in the database extract list above).

Any ideas what I am doing wrong / how I can resolve this?

Thanks

streptomyces KEGG clusterProfiler • 470 views
ADD COMMENT

Login before adding your answer.

Traffic: 1330 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6