Question: genbank parsing using perl
0
gravatar for sharmatina189059
2.1 years ago by
United States
sharmatina18905940 wrote:

I have genbank file. and want to retrieve relevant information. the problem is when the code fetches data related to product, it considers only one line information. since the while loop read data line by line.

#! /usr/local/bin/perl -w

open (GB,"$ARGV[0]");
open (AC, ">$ARGV[1]");


print AC "Locus_Tag\tProtein_ID\tProduct\tDB_Xref\n";

#print "Locus\tLength\tSequence name\tGI\tOrganism\tGene\tCDS\tTranslation\n";
while (<GB>)
{
    if (/(\/locus_tag=")(.*)(")/)
    {
        print AC "\n$2\t";
    }
  elsif (/(\/protein_id=")(.*)(")/)
   {
      print AC "$2\t";
   }   
    #elsif (/(\/strain=")(.*)(")/)
    #{
    #  print AC "$2\t";
    # }
    elsif (/(\/product=")(.*?)(")/)
    {
        print AC "$2\t";
    }
    elsif (/(\/db_xref=")(.*)(")/)
    {
        print AC "$2";
    }
    # elsif (/(\/translation=")(.*)(")/)
    # {
    # print AC "$2\t";
    #}

    # elsif (/(\/db_xref="UniProtKB)(.*)(")/)
    # {
    #print "$3\t";
    #}
   else
   {
      # Skip this data
   }
}
close AC;


close GB;

and if the product is mention in multiple line like this.

gene            171455..172330
                     /locus_tag="A1S_0148"
     CDS             171455..172330
                     /locus_tag="A1S_0148"
                     /codon_start=1
                     /transl_table=11
                     /product="membrane-bound ATP synthase F0 sector, subunit
                     a"

how to retrieve multiline data.

parse genbank file genome • 1.0k views
ADD COMMENTlink modified 2.1 years ago by Michael Dondrup46k • written 2.1 years ago by sharmatina18905940
0
gravatar for Michael Dondrup
2.1 years ago by
Bergen, Norway
Michael Dondrup46k wrote:

Don't use a handwritten parser, use Bio::DB::Genbank instead.

ADD COMMENTlink written 2.1 years ago by Michael Dondrup46k

i want to use this program. Just wanted to get complete product name but could not fetch with this regex.

ADD REPLYlink written 2.1 years ago by sharmatina18905940

Genbank is a notoriously difficult format to handle computationally. BioPerl has all of the tools you need to retrieve the product/feature fully. I don't know perl well, but it will probably take about 10 lines of code.

ADD REPLYlink written 2.1 years ago by Joe14k

But my code works well with the single line data of genbank file. Anyways, thanks for reply.

ADD REPLYlink written 2.1 years ago by sharmatina18905940

Your code does not work correctly and causes problems or the format does, as you have experienced, therefore there is no good reason of using it. 'I want' is not a good argument in this case. If you want to look how to parse it correctly the only thing I could recommend is to look into the code of the perl package.

ADD REPLYlink modified 2.1 years ago • written 2.1 years ago by Michael Dondrup46k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1634 users visited in the last hour