Question

Fetch Many Files With Accession Number, Output File Format Is Coding Sequences In Fasta Per

0

Entering edit mode

12.5 years ago

Shihai Feng • 0

Hi all,

How do I fetch many many files according to accession numbers from Genbank. The output I would like is one file per accession numbers. The format of output file is Coding Sequences in Fasta. Any suggestion are highly appreciated.

Thanks, Shihai

• 2.7k views

ADD COMMENT • link updated 10.1 years ago by Biostar 20 • written 12.5 years ago by Shihai Feng • 0

score 1 · Answer 1 · 2011-10-21

1

Entering edit mode

12.5 years ago

fransua ▴ 390

perhaps you can use biomart at ensembl: http://www.ensembl.org/biomart/martview/ or directly from biomart: http://www.biomart.org/

ADD COMMENT • link 12.5 years ago by fransua ▴ 390

score 0 · Answer 2 · 2011-10-21

0

Entering edit mode

12.5 years ago

Fabian Bull ★ 1.3k

If you are able to code have a look at: using Bioperl to retrieve multiple sequences

Just change the query variable accordingly.

ADD COMMENT • link 12.5 years ago by Fabian Bull ★ 1.3k

score 0 · Answer 3 · 2011-10-21

You can download the nucleic acid sequences in the GENBANK format using the following piece of code:Run through a for loop for many acc. nos. Then one by one parse the files to get the CDS.

use Bio::DB::GenBank;
use Bio::SeqIO;
use strict;

my $acc='NM_001003933';#example entry
my $format='genbank';
my $file_name=$acc.'.gbk';

my $seqout = new Bio::SeqIO( -file => ">$file_name", -format=>$format);
    my $getseq = new Bio::DB::GenBank;
    my $seq = $getseq->get_Seq_by_acc($acc);
    $seqout->write_seq($seq);

This piece of code can parse the GENBANK files to get the CDS in fasta format.However where CDS is not availble like in case of pseudogenes or rRNA etc this can crash. The scalar(@cds_features) will return zero then. Use it if you have non coding sequences

    use strict;
    use warnings;
    use Bio::SeqIO;
    my $gb_file="NM_001003933.gbk";
    my @cds_features = grep { $_->primary_tag eq 'CDS' } Bio::SeqIO->new(-file => $gb_file)->next_seq->get_SeqFeatures;

    my @tags=qw/protein_id product translation/;
    my ($feat_object)=@cds_features;
    my ($id,$product,$translation)=map{$feat_object->get_tag_values($_);}@tags;

print ">",$id," ",$product,"\n",$translation,"\n";