Question: How to download multiple sequences from a database (ie Genbank) in Perl given a text file that contains the Ids?
0
gravatar for Andrea
3.3 years ago by
Andrea40
Sweden
Andrea40 wrote:

The following code is just for one id and the get_seq_by_id doesnt accept more or an array

use Bio::DB::GenBank;
my $gb=new Bio::DB::GenBank;
my $seq= $gb->get_Seq_by_id ('ADE06225');

write_sequence (">roar.fa", 'fasta',$seq );

Given a text file get me the sequences! Any clue???

ADE06225
KJU84168
CEJ19818
AEG68532
......
ADD COMMENTlink modified 3.3 years ago by SES8.2k • written 3.3 years ago by Andrea40
3
gravatar for SES
3.3 years ago by
SES8.2k
Vancouver, BC
SES8.2k wrote:

I would use NCBI's efetch service for this. To access this service you can write your own script to perform the request or use BioPerl. Since you are using BioPerl I would recommend Bio::DB::EUtilities. Here is an example:

use strict;
use warnings;
use Bio::DB::EUtilities;

my @ids = qw(ADE06225 KJU84168 CEJ19818 AEG68532);

my $factory = Bio::DB::EUtilities->new(-eutil   => 'efetch',
                                       -db      => 'protein',
                                       -rettype => 'fasta',
                                       -email   => 'mymail@foo.bar',
                                       -id      => \@ids);

my $file = 'myseqs.fasta';

# dump HTTP::Response content to a file (not retained in memory)
$factory->get_Response(-file => $file);

If you want genbank, just change the return type to gb or whatever format you want. If you want to split the individual records returned instead of having them all in one file, this can easily be done with Bio::SeqIO. Check out the EUtilities HOWTO and the EUtilities Cookbook for more examples and explanation.

Also, please avoid cross-posting (I noticed you posted on PerlMonks also). This is fine if there are no responses but it is generally discouraged and seen as a misuse of people's time.

ADD COMMENTlink modified 3.3 years ago • written 3.3 years ago by SES8.2k
1
gravatar for Daniel
3.3 years ago by
Daniel3.7k
Cardiff University
Daniel3.7k wrote:

I wouldn't be surprised if there is a batching method in place in bioperl already, but using just basic perl you can loop very easily with something like this:

use Bio::DB::GenBank;
my $gb=new Bio::DB::GenBank;

my $infile = $ARGV[0];    # first argument after the script is a text file of IDs, one per line
open(IN, "$infile") or die "Unable to open $infile: $!";  

while (<IN>){
    my $id = chomp($_); 
    my $seq= $gb->get_Seq_by_id ('$id');
    write_sequence (">roar.fa", 'fasta',$seq );
}

I haven't tested this but it should get you close. I recommend reading up on loops in perl to see how to do this properly.

ADD COMMENTlink written 3.3 years ago by Daniel3.7k

I do not think we should be giving people code.

ADD REPLYlink written 3.3 years ago by RamRS22k
2

Providing code is encouraged, that is why there is syntax highlighting for code (and there has been lots of discussion about it) and auto-linking for gists. If you don't want to provide code, that's fine, but don't discourage people from helping.

ADD REPLYlink written 3.3 years ago by SES8.2k

I'd definitely read these discussions - I've rarely come across them. Code encourages copy-paste, and must be used as the last option. I'd prefer (both as the one who asks and the one who responds) if people gave me an approach and a few leads and not the answer directly. Direct answers help when the question is on a concept, but when it's code, it's better to have people explore.

In Bioinformatics, of course, there's always a well-tested tool that does what you need doing.

ADD REPLYlink written 3.3 years ago by RamRS22k
0
gravatar for RamRS
3.3 years ago by
RamRS22k
Houston, TX
RamRS22k wrote:
  1. This is a programming question
  2. Look up file reading and loops in Perl
ADD COMMENTlink written 3.3 years ago by RamRS22k

This is not an answer, and this type of response is discouraging for new users. It is also a bit ironic that you made all those posts about how to use the site, including how to provide an answer.

ADD REPLYlink written 3.3 years ago by SES8.2k

This was not a clear bioinformatics question, IMO. However, it as close enough on the border to not close. I do believe providing code cripples learning and that we should stop with providing just the approach.

EDIT: I see how the answer might be seen as a bit curt. Given the user is not new and shows a history of asking code-based questions, I did not treat them like I would a newcomer.

ADD REPLYlink modified 3.3 years ago • written 3.3 years ago by RamRS22k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1370 users visited in the last hour