8.5 years ago by
You have 2 problems to solve here: fetching the sequences and parsing them to extract CDS.
To deal with the second problem: install Bioperl, if you have not already done so. Then, take a look at the SeqIO how-to. If you installed the accessory scripts, there's a handy utility named bp_extract_feature_seq, which you can run like this:
bp_extract_feature_seq -i NC_005213.gb --format genbank --feature=CDS -o NC_005213.fa
It will write a fasta file containing the coding sequences of all CDS features.
You'll want to automate the process of fetching sequences by looping through the replicon accessions. Here's some sample code from the Bioperl tutorial which will fetch a sequence from RefSeq and write it in GenBank format:
my $seq_object = get_sequence('refseq', "NC_005213");
write_sequence(">NC_005213.gb", 'genbank', $seq_object);
It should not be too hard to write a loop into that, using the NC_* accessions from your file.