Question: Fetching NCBI gene symbols from NCBI protein ids or GI identifiers
0
gravatar for Vijay Lakhujani
11 months ago by
Vijay Lakhujani3.1k
India
Vijay Lakhujani3.1k wrote:

I have the following information from blastx annotations of bacterial genes predicted by prodigal:

Sequence name   Sequence desc.  Sequence length Hit desc.   Hit ACC
gene_1_contig_1 excinuclease ABC subunit A  228 gi|1055624747|ref|WP_067265422.1|excinuclease ABC subunit A [Sulfitobacter sp. HI0054] gi|1024544140|gb|KZY51396.1| excinuclease ABC subunit A [Sulfitobacter sp. HI0054]   WP_067265422, KZY51396
gene_2_contig_1 excinuclease ABC subunit A  210 gi|1055651942|ref|WP_067291557.1|excinuclease ABC subunit A [Sulfitobacter sp. EhC04] gi|1032103716|gb|OAN76192.1| excinuclease ABC subunit A [Sulfitobacter sp. EhC04] WP_067291557, OAN76192
gene_3_contig_1 MFS transporter 432 gi|1055624744|ref|WP_067265419.1|MFS transporter [Sulfitobacter sp. HI0054] gi|1024544139|gb|KZY51395.1| hypothetical protein A3734_05250 [Sulfitobacter sp. HI0054]    WP_067265419, KZY51395
gene_4_contig_1 MFS transporter 561 gi|1055624744|ref|WP_067265419.1|MFS transporter [Sulfitobacter sp. HI0054] gi|1024544139|gb|KZY51395.1| hypothetical protein A3734_05250 [Sulfitobacter sp. HI0054]    WP_067265419, KZY51395

I wish to fetch gene symbols using the information (either the gi identifiers or the protein accessions) from the blastx results; may be using entrex efetch.

So, the result would be as below:

Gene Name                         Gene symbol
excinuclease ABC subunit A        UvrA

See, the link here. However, I am not sure how to proceed in this case. Can anybody please suggest something?

entrez efetch gene ncbi • 601 views
ADD COMMENTlink modified 11 months ago by Puli Chandramouli Reddy150 • written 11 months ago by Vijay Lakhujani3.1k

Hi Vijay, Did you try using Biomart? it has some useful function to fetch gene symbols.

ADD REPLYlink written 11 months ago by Sreeraj Thamban100

The gene symbol appears to have been included in the description: https://www.ncbi.nlm.nih.gov/protein/1055624747/

ADD REPLYlink modified 11 months ago • written 11 months ago by Sej Modha3.7k

Unfortunately, that is not true for all the entries which I have. That had saved a lot of time

ADD REPLYlink written 11 months ago by Vijay Lakhujani3.1k

How about db2db where you would convert RefSeq Protein Accession to Gene ID? https://biodbnet-abcc.ncifcrf.gov/db/db2db.php

ADD REPLYlink written 11 months ago by Sej Modha3.7k

You could do something like:

esearch -db protein -query "1055624747" | efetch -format docsum | xtract -pattern Title -element Title

Problem is you are dealing with WP* entries which are non-redundant protein entries from multiple strains etc. so the gene symbol is not separately annotated.

ADD REPLYlink written 11 months ago by genomax57k
0
gravatar for Puli Chandramouli Reddy
11 months ago by
Pune, India
Puli Chandramouli Reddy150 wrote:

Hi,

You can use GI ids to retrieve associated information from uniprot "Retrieve/ID mapping" UniProtKB. Here, from "GI number" to "UniProtKB" should be selected and it will give output with all the information you need in tabular format and you can select columns of your interest.

Another way is to use batchentrez to get gene bank data and you need to parse the information.

ADD COMMENTlink written 11 months ago by Puli Chandramouli Reddy150
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 689 users visited in the last hour