When Annotating Features On Sequences With Bioperl, What Is The Best Way To Pass The Annotated Sequences Between Scripts?
1
0
Entering edit mode
14.0 years ago
Ryan Thompson ★ 3.6k

I am writing a series of scripts for performing and then reporting miRNA binding prediction, and I need to get the data from one script to the next. The input to the whole pipeline is any rich sequence format, like EMBL or genbank format. The first script reads the input sequences and adds features to them, and the second script reports on those features. My original plan was to have the first script simply output the sequences to genbank format. However, I realized that I couldn't save subfeatures if I did this. Furthermore, if I choose to annotate the predicted binding sites using Bio::SeqFeature::Computation instead of Bio::SeqFeature::Generic, then I doubt that the second script will magically detect that the features in the genbank sequence file are supposed to be converted into Bio::SeqFeature::Computation objects.

So is something like Storable or Data::Dump a viable alternative for data interchange between my scripts? Are there any caveats I should be aware of when freezing/thawing blessed objects instead of naked data structures?

bioperl data perl sequence sequence • 3.0k views
ADD COMMENT
1
Entering edit mode
14.0 years ago
Biosidd ▴ 40

You could use SeqFeature storage to store the output from one script and then the next script follows it up from there. Either of file based storage system such as berkelydb3/bdb/sqlite would be sufficient. The seqfeature storage internally uses Data::Dumper/Storable for serializing the features.

ADD COMMENT
0
Entering edit mode

Nice idea, but that doesn't cover the sequences. Ideally, I'd like to store the sequences with all their annotations in a single file. Thus my original decision to use genbank format.

ADD REPLY
0
Entering edit mode

I believe it does load the sequences if your GFF3.0 file has it defined under the ##FASTA tag or if your Seqfeatue has an attached sequence. It seems in your case you could just stick the EMBL/GenBank file though Bio::DB::Flat family of modules and then access it as a database.

ADD REPLY
0
Entering edit mode

I ended up implementing my own custom "unflatten" subroutine for my particular data, but this is probably a better answer.

ADD REPLY

Login before adding your answer.

Traffic: 3404 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6