Question: What is NCBI Gene ID, where to find it and how to convert to entrez ID?
1
gravatar for mnazir
13 days ago by
mnazir0
mnazir0 wrote:

Hi,

I am new to bioinformatics and starting to learn recently. I have a question about gene ID if someone can guide me. I want to upload the RNA Seq data to Kegg Exp to draw pathway on the basis of differential expression analysis. The file requires the gene symbols and GENE ID which I don't know where to find for the microorganism I am working with i.e. Clostridium beijerinckii ATCC 35702. I have read paper and information about it that it is a number to identify genes specifically but I am struggling with finding the source where to look for it.

Thanks a lot

rna-seq tutorial • 108 views
ADD COMMENTlink modified 13 days ago by Asaf6.3k • written 13 days ago by mnazir0

Genome page for this bacterium is available here. If you look at the "GFF" link (search for that word) you can find the annotation file for this genome.

GeneID are present in this file but there are no gene symbols. Here is an example snippet from the file.

NZ_CP010086.2   Protein Homology        CDS     226406  228025  .       +       0       ID=cds207;Parent=gene235;Dbxref=Genbank:WP_017209683.1,GeneID:31661091;Name=WP_017209683.1;gbk
ey=CDS;inference=COORDINATES: similar to AA sequence:RefSeq:WP_008426658.1;product=phosphoenolpyruvate--protein phosphotransferase;protein_id=WP_017209683.1;transl_table=11
NZ_CP010086.2   RefSeq  gene    228170  229435  .       +       .       ID=gene236;Dbxref=GeneID:31661092;Name=LF65_RS01175;gbkey=Gene;gene_biotype=protein_coding;locus_tag=LF65_RS01
175;old_locus_tag=LF65_00235
NZ_CP010086.2   Protein Homology        CDS     228170  229435  .       +       0       ID=cds208;Parent=gene236;Dbxref=Genbank:WP_041893490.1,GeneID:31661092;Name=WP_041893490.1;gbk
ey=CDS;inference=COORDINATES: similar to AA sequence:RefSeq:WP_017209682.1;product=hypothetical protein;protein_id=WP_041893490.1;transl_table=11
NZ_CP010086.2   RefSeq  gene    229596  230771  .       +       .       ID=gene237;Dbxref=GeneID:31661093;Name=LF65_RS01180;gbkey=Gene;gene_biotype=protein_coding;locus_tag=LF65_RS01
180;old_locus_tag=LF65_00236
NZ_CP010086.2   Protein Homology        CDS     229596  230771  .       +       0       ID=cds209;Parent=gene237;Dbxref=Genbank:WP_041893492.1,GeneID:31661093;Name=WP_041893492.1;gbk
ey=CDS;inference=COORDINATES: similar to AA sequence:RefSeq:WP_011967550.1;product=beta-aspartyl-peptidase;protein_id=WP_041893492.1;transl_table=11
NZ_CP010086.2   RefSeq  gene    230842  231447  .       -       .       ID=gene238;Dbxref=GeneID:31661094;Name=LF65_RS01185;gbkey=Gene;gene_biotype=protein_coding;locus_tag=LF65_RS01
185;old_locus_tag=LF65_00237
ADD REPLYlink modified 13 days ago • written 13 days ago by genomax73k

Thanks a lot for your comment really appreciate it. NZ_CP010086.2 is the strain NCIMB14988 however, I am looking for Strain Accession number CP006777.1 strain ATCC 35702 which I am not able to obtain. Does that mean the GENE ID information for this strain has not been submitted and I can use NZ_CP010086.2 strain information as reference? or is there anything that I am missing on this?

ADD REPLYlink written 12 days ago by mnazir0

Here is the page for ATCC 35702. GFF file for this strain does not have GeneID or locus information.

NZ_CP006777.1   Protein Homology        CDS     31554   32342   .       +       0       ID=cds19;Parent=gene29;Dbxref=Genbank:WP_011967385.1;Name=WP_011967385.1;gbkey=CDS;inference=COORDINATES: sim
ilar to AA sequence:RefSeq:WP_011967385.1;product=Cof-type HAD-IIB family hydrolase;protein_id=WP_011967385.1;transl_table=11
NZ_CP006777.1   RefSeq  gene    32490   34085   .       -       .       ID=gene30;Name=CBS_RS00155;gbkey=Gene;gene_biotype=protein_coding;locus_tag=CBS_RS00155;old_locus_tag=Cbs_0021
NZ_CP006777.1   Protein Homology        CDS     32490   34085   .       -       0       ID=cds20;Parent=gene30;Dbxref=Genbank:WP_011967386.1;Name=WP_011967386.1;gbkey=CDS;inference=COORDINATES: sim
ilar to AA sequence:RefSeq:WP_015390206.1;product=heme ABC transporter ATP-binding protein;protein_id=WP_011967386.1;transl_table=11
ADD REPLYlink modified 12 days ago • written 12 days ago by genomax73k

Thanks a lot. Can you share the link where you're looking for this information I am going to NCBI and then I select genome from the database drop down menu but I don't get the same hit as yours. why is that so? Moreover, can you please guide me how can I convert GENE ID to correspoding Entrez ID.

Is it possible that a strain not having GENE ID will also not have corresponding entrez iD number? Or it can still have entrezID numbers?

ADD REPLYlink modified 12 days ago • written 12 days ago by mnazir0

I provided direct links in my comment above which should open the right page when you click on then (blue highlighted text).

Some of the annotation you see here may have been done by automated programs that will generate these types of annotation. Someone has to manually do the work to verify that annotation. One of the reasons it is cheap to sequence things but much more expensive to properly annotate.

ADD REPLYlink written 12 days ago by genomax73k

Yeah You're right manually annotation is needed to have all the pieces together which I am trying to do for our lab strain. I have been adding information manually to the table of our strain. Right now I am struggling with this geneID to entrez ID conversion problem. As I wish to draw pathways of differentially expressed genes on Keggexp and it accepts only entrezID. I hope I am able to add a better annotated table for at least one strain for people to be able to find everything at one place in future. Thanks for your valued comments

ADD REPLYlink written 12 days ago by mnazir0

I am not sure what you are planning to do with this, because KeggExp does not even support Clostridium spp. IDs: Untitled

ADD REPLYlink written 12 days ago by Kevin Blighe50k

I came across this paper (Specialized activities and expression differences for Clostridium thermocellum biofilm and planktonic cells.(https://www.nature.com/articles/srep43583)) where they have used Kegg EXP to show differential expression analysis in pathways. This is what I am trying to learn. Kegg EXP mentions on the site that you can choose any of the closely related species in the above drop down menu and go ahead with further analysis. Since it asks you for the corresponding Entrez ID and gene ID it will match with the right organism I think.

ADD REPLYlink written 12 days ago by mnazir0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1275 users visited in the last hour