Question: NCBI Protein GI to Genome Accession
4
gravatar for Sej Modha
5.4 years ago by
Sej Modha4.7k
Glasgow, UK
Sej Modha4.7k wrote:

I have a list of protein GI and would like to get accession number of the genome (DBSOURCE in genbank file) using Eutilities.

What would be the easiest way to get genome accession numbers for a list of protein GI?

 

eutils sequence ncbi • 5.5k views
ADD COMMENTlink modified 5 months ago by josev.die20 • written 5.4 years ago by Sej Modha4.7k
6
gravatar for Sej Modha
4.5 years ago by
Sej Modha4.7k
Glasgow, UK
Sej Modha4.7k wrote:

An alternative to perl e-util is to use the Unix e-utils. Following one liner does the job! Note that following command would work for accession number as well as GIs as -id parameter in elink command.

elink -db protein -id 817524604 -target nuccore|efetch -format acc
ADD COMMENTlink written 4.5 years ago by Sej Modha4.7k
5
gravatar for genomax
5.3 years ago by
genomax91k
United States
genomax91k wrote:

For future reference:

There is an interesting file @NCBI which contains information about various accession numbers/gene names: ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/gene2accession.gz

This file is automatically generated everyday and should contain latest information about GeneID's and accession numbers.

ADD COMMENTlink written 5.3 years ago by genomax91k

Just wanted to say thanks for this, this saved me a massive amount of time genomax2!

I threw together a makefile and a small python script to convert the 4.8Gb uncompressed file into a kyoto-cabinet database in case anyone wants it.

ADD REPLYlink modified 10 months ago by RamRS30k • written 4.8 years ago by rasche.eric70
4
gravatar for Sej Modha
5.3 years ago by
Sej Modha4.7k
Glasgow, UK
Sej Modha4.7k wrote:

Here is a perl script to do this. This script is explained in detail here

#!/usr/bin/perl

use Bio::DB::EUtilities;

#my @ids     = qw(817524604 726965494);
my $infile = $ARGV[0];
my @ids;

open (IN,"$infile")||die "can't open $infile\n";

while(<IN>)
{
  chomp($_);
  my @ids=$_;
# print @ids."\n";    
  my $factory = Bio::DB::EUtilities->new(-eutil          => 'elink',
                                       -email          => 'mymail@foo.bar',
                                       -db             => 'nucleotide',
                                       -dbfrom         => 'protein',
                                       -correspondence => 1,
                                       -id             => \@ids);

  # iterate through the LinkSet objects
  while (my $ds = $factory->next_LinkSet) 
  {
    #print "   Link name: ",$ds->get_link_name,"\n";
    my $protid = join(',',$ds->get_submitted_ids);
    print "Protein ID:" . $protid ."\t";
    #print "Protein ID: ",join(',',$ds->get_submitted_ids),"\t";
    my $nucid = join(',',$ds->get_ids);
    print "Nuc ID:" . $nucid ."\t";
    my $factory = Bio::DB::EUtilities->new(-eutil   => 'efetch',
                           -db      => 'nucleotide',
                                              -id      => $nucid,
                                              -email   => 'mymail@foo.bar',
                                              -rettype => 'acc');
   my @accs = split(m{\n},$factory->get_Response->content);
   print "Genome Accession: " .join(',',@accs), "\n";
   }
}
ADD COMMENTlink modified 10 months ago by RamRS30k • written 5.3 years ago by Sej Modha4.7k
0
gravatar for josev.die
5 months ago by
josev.die20
josev.die20 wrote:

Using #R :

# Dependencies
library(rentrez) 

# NCBI Protein GI to Genome Accession
gi = 817524604
gi_link = entrez_link(dbfrom = 'protein', db = 'nuccore', id = gi)
nucc_id = gi_link$links$protein_nuccore
nucc = entrez_summary(db = 'nuccore', id = nucc_id)
nucc$caption
ADD COMMENTlink written 5 months ago by josev.die20
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1719 users visited in the last hour