Question: Parsing Genbank Format In Bioperl.
0
gravatar for Daniel
8.0 years ago by
Daniel3.8k
Cardiff University
Daniel3.8k wrote:

I am attempting (with bioperl) to extract the JOURNAL field from a set of Genbank records, but I cant find a list of the references that are used ie

while (my $seq = $in->next_seq() ) {
        print $seq->accession . "\n";

prints accession number

while (my $seq = $in->next_seq() ) {
        print $seq->desc; . "\n";

prints the description

while (my $seq = $in->next_seq() ) {
        print $seq->seq. "\n";

Prints the gene sequence etc, etc, etc.

This has just been gleaned from the bioperl site and other questions as I cant find a reference for the whole scheme. Can anyone point me in the right direction? The http://www.bioperl.org/wiki/Module:Bio::SeqIO::genbank is a dead end unfortunately.

Thanks


For reference:

LOCUS       JQ354682                1420 bp    DNA     linear   PLN 01-JAN-2013
DEFINITION  Gomphonema clevei strain TCC507 ribulose-1,5-bisphosphate
            carboxylase/oxygenase large subunit (rbcL) gene, partial cds;
            chloroplast.
ACCESSION   JQ354682
VERSION     JQ354682.1  GI:410947001
KEYWORDS    .
SOURCE      chloroplast Gomphonema clevei
  ORGANISM  Gomphonema clevei
            Eukaryota; Stramenopiles; Bacillariophyta; Bacillariophyceae;
            Bacillariophycidae; Cymbellales; Gomphonemataceae; Gomphonema.
REFERENCE   1  (bases 1 to 1420)
  AUTHORS   Kermarrec,L., Bouchez,A., Rimet,F. and Humbert,J.-F.
  TITLE     Using a polyphasic approach to explore the diversity and
            geographical distribution of the Gomphonema parvulum (Kutzing)
            Kutzing complex (Bacillariophyta)
  JOURNAL   Unpublished
REFERENCE   2  (bases 1 to 1420)
  AUTHORS   Kermarrec,L., Bouchez,A., Rimet,F. and Humbert,J.-F.
  TITLE     Direct Submission
  JOURNAL   Submitted (05-JAN-2012) Asconit Consultants, 3 bld de Clairfont
            Bat. G, Toulouges F-66350, France
FEATURES             Location/Qualifiers
     source          1..1420
                     /organism="Gomphonema clevei"
                     /organelle="plastid:chloroplast"
                     /mol_type="genomic DNA"
                     /strain="TCC507"
                     /isolation_source="river"
                     /db_xref="taxon:1223578"
                     /country="Mayotte"
                     /collection_date="20-Apr-2009"
     gene            <1..>1420
                     /gene="rbcL"
     CDS             <1..>1420
                     /gene="rbcL"
                     /codon_start=1
                     /transl_table=11
                     /product="ribulose-1,5-bisphosphate carboxylase/oxygenase
                     large subunit"
                     /protein_id="AFV95053.1"
                     /db_xref="GI:410947002"
                     /translation="DRYESGVIPYAKMGYWDASYAVKTTDVLALFRITPQPGVDPVEA
                     AAAVAGESSTATWTVVWTDLLTACDRYRAKAYRVDPVPNTTDQFFAFIAYECDLFEEG
bioperl genbank • 6.6k views
ADD COMMENTlink modified 8.0 years ago by Ryan Dale4.9k • written 8.0 years ago by Daniel3.8k

Hi Daniel, did you find something here that works? I am looking to extract the references from genbank files as well. I have read all the links here and am not having success with creating a perl script that works. Thanks!

ADD REPLYlink written 5.5 years ago by kbrann30

Please read the first answer by Ryan and comments underneath for the solution.

ADD REPLYlink written 5.5 years ago by Neilfws49k

Hello Neilfws. I have looked at those references and it is not straightforward for me. I have the following code that gives me one reference title, but my genbank file has many sequences.

#!/user/bin/perl
use strict;
use warnings;
use Bio::SeqIO;

my $io = Bio::SeqIO->new(-file => "sequence.gb", -format => "genbank" );
my $seq_obj = $io->next_seq();
my $anno_collection = $seq_obj->annotation;

for my $key ( $anno_collection->get_all_annotation_keys ) {
    my @annotations = $anno_collection->get_Annotations($key);
    for my $value ( @annotations ) {
        if ($value->tagname eq "reference") {
            print "title: ",$value->title(), "\n";
        }
    }
}
ADD REPLYlink modified 14 months ago by _r_am32k • written 5.5 years ago by kbrann30
4
gravatar for Ryan Dale
8.0 years ago by
Ryan Dale4.9k
Bethesda, MD
Ryan Dale4.9k wrote:

Searching CPAN for "genbank" finds more detailed Bio::SeqIO::genbank docs and near the end of that is a link to a how-to on feature annotation (since presumably the JOURNAL field will be considered an annotation).

ADD COMMENTlink written 8.0 years ago by Ryan Dale4.9k
1

And indeed, extracting the REFERENCE section (to which JOURNAL belongs) is right there in the feature annotation how-to. Just search the page for the phrase "Some Annotation objects, like Reference".

ADD REPLYlink written 8.0 years ago by Neilfws49k
1

That's great, thanks. found what I needed. The link to the CPAN seqIO::genbank from the wiki doesnt work and I went searching in a different direction. If I had found that I think I would have been sorted from the offset.

ADD REPLYlink written 8.0 years ago by Daniel3.8k
2
gravatar for Istvan Albert
8.0 years ago by
Istvan Albert ♦♦ 86k
University Park, USA
Istvan Albert ♦♦ 86k wrote:

This is a more complex topic that you will need to spend some time with. Find a good guide on BioPerl in general via Google.

For example I found this good chapter: Beginning Perl for Bioinformatics: Genbank

ADD COMMENTlink written 8.0 years ago by Istvan Albert ♦♦ 86k
1

The book just instructed me to parse the file in standard perl which would have been my default anyway. I was trying to use this opportunity to learn more about bioperl and it was just a single parameter that was catching me out. Found it now though.

ADD REPLYlink written 8.0 years ago by Daniel3.8k
1

my mistake - on cursory examination I thought it was BioPerl but instead it seems to make use of their own custom module BeginPerlBioinfo

ADD REPLYlink written 8.0 years ago by Istvan Albert ♦♦ 86k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1162 users visited in the last hour