Question: What is NCBI Gene ID, where to find it and how to convert to entrez ID?
I am new to bioinformatics and starting to learn recently. I have a question about gene ID if someone can guide me. I want to upload the RNA Seq data to Kegg Exp to draw pathway on the basis of differential expression analysis. The file requires the gene symbols and GENE ID which I don't know where to find for the microorganism I am working with i.e. Clostridium beijerinckii ATCC 35702. I have read paper and information about it that it is a number to identify genes specifically but I am struggling with finding the source where to look for it.

Thanks a lot

rna-seq tutorial • 392 views
Genome page for this bacterium is available here. If you look at the "GFF" link (search for that word) you can find the annotation file for this genome.

GeneID are present in this file but there are no gene symbols. Here is an example snippet from the file.

NZ_CP010086.2   Protein Homology        CDS     226406  228025  .       +       0       ID=cds207;Parent=gene235;Dbxref=Genbank:WP_017209683.1,GeneID:31661091;Name=WP_017209683.1;gbk
ey=CDS;inference=COORDINATES: similar to AA sequence:RefSeq:WP_008426658.1;product=phosphoenolpyruvate--protein phosphotransferase;protein_id=WP_017209683.1;transl_table=11
NZ_CP010086.2   RefSeq  gene    228170  229435  .       +       .       ID=gene236;Dbxref=GeneID:31661092;Name=LF65_RS01175;gbkey=Gene;gene_biotype=protein_coding;locus_tag=LF65_RS01
NZ_CP010086.2   Protein Homology        CDS     228170  229435  .       +       0       ID=cds208;Parent=gene236;Dbxref=Genbank:WP_041893490.1,GeneID:31661092;Name=WP_041893490.1;gbk
ey=CDS;inference=COORDINATES: similar to AA sequence:RefSeq:WP_017209682.1;product=hypothetical protein;protein_id=WP_041893490.1;transl_table=11
NZ_CP010086.2   RefSeq  gene    229596  230771  .       +       .       ID=gene237;Dbxref=GeneID:31661093;Name=LF65_RS01180;gbkey=Gene;gene_biotype=protein_coding;locus_tag=LF65_RS01
NZ_CP010086.2   Protein Homology        CDS     229596  230771  .       +       0       ID=cds209;Parent=gene237;Dbxref=Genbank:WP_041893492.1,GeneID:31661093;Name=WP_041893492.1;gbk
ey=CDS;inference=COORDINATES: similar to AA sequence:RefSeq:WP_011967550.1;product=beta-aspartyl-peptidase;protein_id=WP_041893492.1;transl_table=11
NZ_CP010086.2   RefSeq  gene    230842  231447  .       -       .       ID=gene238;Dbxref=GeneID:31661094;Name=LF65_RS01185;gbkey=Gene;gene_biotype=protein_coding;locus_tag=LF65_RS01
Thanks a lot for your comment really appreciate it. NZ_CP010086.2 is the strain NCIMB14988 however, I am looking for Strain Accession number CP006777.1 strain ATCC 35702 which I am not able to obtain. Does that mean the GENE ID information for this strain has not been submitted and I can use NZ_CP010086.2 strain information as reference? or is there anything that I am missing on this?

Here is the page for ATCC 35702. GFF file for this strain does not have GeneID or locus information.

NZ_CP006777.1   Protein Homology        CDS     31554   32342   .       +       0       ID=cds19;Parent=gene29;Dbxref=Genbank:WP_011967385.1;Name=WP_011967385.1;gbkey=CDS;inference=COORDINATES: sim
ilar to AA sequence:RefSeq:WP_011967385.1;product=Cof-type HAD-IIB family hydrolase;protein_id=WP_011967385.1;transl_table=11
NZ_CP006777.1   RefSeq  gene    32490   34085   .       -       .       ID=gene30;Name=CBS_RS00155;gbkey=Gene;gene_biotype=protein_coding;locus_tag=CBS_RS00155;old_locus_tag=Cbs_0021
NZ_CP006777.1   Protein Homology        CDS     32490   34085   .       -       0       ID=cds20;Parent=gene30;Dbxref=Genbank:WP_011967386.1;Name=WP_011967386.1;gbkey=CDS;inference=COORDINATES: sim
ilar to AA sequence:RefSeq:WP_015390206.1;product=heme ABC transporter ATP-binding protein;protein_id=WP_011967386.1;transl_table=11
Thanks a lot. Can you share the link where you're looking for this information I am going to NCBI and then I select genome from the database drop down menu but I don't get the same hit as yours. why is that so? Moreover, can you please guide me how can I convert GENE ID to correspoding Entrez ID.

Is it possible that a strain not having GENE ID will also not have corresponding entrez iD number? Or it can still have entrezID numbers?

I provided direct links in my comment above which should open the right page when you click on then (blue highlighted text).

Some of the annotation you see here may have been done by automated programs that will generate these types of annotation. Someone has to manually do the work to verify that annotation. One of the reasons it is cheap to sequence things but much more expensive to properly annotate.

Yeah You're right manually annotation is needed to have all the pieces together which I am trying to do for our lab strain. I have been adding information manually to the table of our strain. Right now I am struggling with this geneID to entrez ID conversion problem. As I wish to draw pathways of differentially expressed genes on Keggexp and it accepts only entrezID. I hope I am able to add a better annotated table for at least one strain for people to be able to find everything at one place in future. Thanks for your valued comments

I am not sure what you are planning to do with this, because KeggExp does not even support Clostridium spp. IDs: Untitled

I came across this paper (Specialized activities and expression differences for Clostridium thermocellum biofilm and planktonic cells.( where they have used Kegg EXP to show differential expression analysis in pathways. This is what I am trying to learn. Kegg EXP mentions on the site that you can choose any of the closely related species in the above drop down menu and go ahead with further analysis. Since it asks you for the corresponding Entrez ID and gene ID it will match with the right organism I think.

