Question: Fetching NCBI gene symbols from NCBI protein ids or GI identifiers
0
gravatar for Vijay Lakhujani
8 months ago by
Vijay Lakhujani2.7k
India
Vijay Lakhujani2.7k wrote:

I have the following information from blastx annotations of bacterial genes predicted by prodigal:

Sequence name   Sequence desc.  Sequence length Hit desc.   Hit ACC
gene_1_contig_1 excinuclease ABC subunit A  228 gi|1055624747|ref|WP_067265422.1|excinuclease ABC subunit A [Sulfitobacter sp. HI0054] gi|1024544140|gb|KZY51396.1| excinuclease ABC subunit A [Sulfitobacter sp. HI0054]   WP_067265422, KZY51396
gene_2_contig_1 excinuclease ABC subunit A  210 gi|1055651942|ref|WP_067291557.1|excinuclease ABC subunit A [Sulfitobacter sp. EhC04] gi|1032103716|gb|OAN76192.1| excinuclease ABC subunit A [Sulfitobacter sp. EhC04] WP_067291557, OAN76192
gene_3_contig_1 MFS transporter 432 gi|1055624744|ref|WP_067265419.1|MFS transporter [Sulfitobacter sp. HI0054] gi|1024544139|gb|KZY51395.1| hypothetical protein A3734_05250 [Sulfitobacter sp. HI0054]    WP_067265419, KZY51395
gene_4_contig_1 MFS transporter 561 gi|1055624744|ref|WP_067265419.1|MFS transporter [Sulfitobacter sp. HI0054] gi|1024544139|gb|KZY51395.1| hypothetical protein A3734_05250 [Sulfitobacter sp. HI0054]    WP_067265419, KZY51395

I wish to fetch gene symbols using the information (either the gi identifiers or the protein accessions) from the blastx results; may be using entrex efetch.

So, the result would be as below:

Gene Name                         Gene symbol
excinuclease ABC subunit A        UvrA

See, the link here. However, I am not sure how to proceed in this case. Can anybody please suggest something?

entrez efetch gene ncbi • 442 views
ADD COMMENTlink modified 8 months ago by Puli Chandramouli Reddy150 • written 8 months ago by Vijay Lakhujani2.7k

Hi Vijay, Did you try using Biomart? it has some useful function to fetch gene symbols.

ADD REPLYlink written 8 months ago by Sreeraj Thamban90

The gene symbol appears to have been included in the description: https://www.ncbi.nlm.nih.gov/protein/1055624747/

ADD REPLYlink modified 8 months ago • written 8 months ago by Sej Modha3.1k

Unfortunately, that is not true for all the entries which I have. That had saved a lot of time

ADD REPLYlink written 8 months ago by Vijay Lakhujani2.7k

How about db2db where you would convert RefSeq Protein Accession to Gene ID? https://biodbnet-abcc.ncifcrf.gov/db/db2db.php

ADD REPLYlink written 8 months ago by Sej Modha3.1k

You could do something like:

esearch -db protein -query "1055624747" | efetch -format docsum | xtract -pattern Title -element Title

Problem is you are dealing with WP* entries which are non-redundant protein entries from multiple strains etc. so the gene symbol is not separately annotated.

ADD REPLYlink written 8 months ago by genomax51k
0
gravatar for Puli Chandramouli Reddy
8 months ago by
Pune, India
Puli Chandramouli Reddy150 wrote:

Hi,

You can use GI ids to retrieve associated information from uniprot "Retrieve/ID mapping" UniProtKB. Here, from "GI number" to "UniProtKB" should be selected and it will give output with all the information you need in tabular format and you can select columns of your interest.

Another way is to use batchentrez to get gene bank data and you need to parse the information.

ADD COMMENTlink written 8 months ago by Puli Chandramouli Reddy150
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 999 users visited in the last hour