Question: How Can I Get Download Genbank Files With Just The Accession Number?
0
gravatar for biohack92
6.4 years ago by
biohack92150
United States
biohack92150 wrote:

I've got an array full of accession numbers, and I'm wondering if there's a way to automatically save genbank files using BioPerl. I know you can grab sequence information, but I want the entire GenBank record.

#!/usr/bin/env perl
use strict;
use warnings;
use Bio::DB::GenBank;

my @accession;
open (REFINED, "./refine.txt") || die "Could not open: $!";

while(<REFINED>){
    if(/^(\D+)\|(.*?)\|/){
    push(@accession, $2);
    }
}
close REFINED;
foreach my $number(@accession){

    my $db_obj = Bio::DB::GenBank->new;
    }
bioperl genbank • 5.0k views
ADD COMMENTlink modified 4.1 years ago by Biostar ♦♦ 20 • written 6.4 years ago by biohack92150
0
gravatar for joey0214.zhong
6.4 years ago by
Canada
joey0214.zhong20 wrote:

I use this to get Genbank files by a text file of accession nember

#!usr/bin/local/perl -w #@author :joey #usage: perl get_multi_seq_fromNCBI_by_acc.pl acc_file.txt #use this program,can get seq by accession number from NCBI,and name it by acc. #$ARGV[0]=acc.txt

use strict; use Bio::DB::GenBank; use Bio::SeqIO; use Bio::Seq::RichSeq;

open(FILE,$ARGV[0])|| die ("can not open file:$!"); my @acc=<file>;

my $db=new Bio::DB::GenBank(); my $allseq=$db->get_Stream_by_acc([@acc]); while(my $seq=$allseq->next_seq){ #my $filename=$seq->accession; my $output = new Bio::SeqIO(-file=>">>output.fasta",-format=>"fasta"); #if you want fasta seq,can use next #my $output = new Bio::SeqIO(-file=>">$filename.gb",-format=>"genbank"); if($seq){ $output->write_seq($seq); } else{ print STDERR "cannot find sequence for accession number:@acc \n"; } $output ->close(); } close(FILE);

ADD COMMENTlink written 6.4 years ago by joey0214.zhong20

here is the link : https://github.com/joey0214/Perl/blob/master/getMultiSeqFromNCBIByAcc.pl

ADD REPLYlink written 6.4 years ago by joey0214.zhong20

Thanks Joey. I actually would like the entire GenBank file and not just the sequence. Is there any way to automate that?

ADD REPLYlink written 6.4 years ago by biohack92150

I think that script do return the entire genbank file.

ADD REPLYlink written 6.4 years ago by joey0214.zhong20
0
gravatar for qiyunzhu
6.4 years ago by
qiyunzhu420
Buffalo
qiyunzhu420 wrote:

Here's a very simple non-BioPerl solution. It simply connects NCBI by HTTP and downloads the genbank files.

use LWP::Simple;
$s = get "http://www.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=protein&term=".join (",", @accession);
push (@gi, $1) while ($s =~ s/<Id>(\d+)<\/Id>//);
$s = get "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=protein&rettype=gb&id=".join (",", @gi);
print $s;
ADD COMMENTlink modified 6.4 years ago • written 6.4 years ago by qiyunzhu420
1

you don't need to run esearch with an ACN. Jyst use efetch with the ACN instead of the gi.

ADD REPLYlink written 6.4 years ago by Pierre Lindenbaum123k
1

Awesome! I didn't know that before. It worked! Thank you so much!

So the better version should be (I cannot believe how simple it is):

use LWP::Simple;
$s = get "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=protein&rettype=gb&id=".join (",", @accession);
print $s;

But this trick does not work for esummary, i just realized.

Anyway it's good piece of information and I should apply that to my programs.

ADD REPLYlink modified 6.4 years ago • written 6.4 years ago by qiyunzhu420
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 976 users visited in the last hour