Question

Getting intergenic sequences using BioPerl?

0

Entering edit mode

9.0 years ago

fr ▴ 210

Hello! I am trying to parse many genbank files using BioPerl in order to get intergenic sequences. I am trying to adapt the answer at Perl script: generate pseudo-CDSs from intergenic region. However, I can't get to solve the errors I commented there. Namely:

I always get the first term to retrieve the feature "source". When I specify to ignore it (e.g. next if $two[0]->primary_tag eq "source" or shift @two if $two[0]->primary_tag eq "source"), all reads are ignored.
Even in cases where an actual sequence should be retrieved, $subseq is still undef and I don't know why.

All help is appreciated. If you know somewhere where I can find something similar or point me out what's wrong, I really appreciate it!

Here is my code as of now:

use strict;
use warnings;
use autodie;
use Data::Dump;
use Data::Dumper;
use Bio::SeqIO;
STDOUT->autoflush(1);

my $gbffpath='C:\My gbff path';

opendir(DIR, "$gbffpath");
my @files= grep {/\.gbff$/ } readdir(DIR);
closedir(DIR);
my $featc=0;
foreach my $gbfffile (@files){
    my $in = Bio::SeqIO->new(-file=>"<$gbffpath\\$gbfffile", -format=>"genbank");
    my $obj = $in->next_seq();
    my @features = $obj->get_SeqFeatures();
    my @two = shift @features;

    foreach my $fefe  (@features){
        my $pt = $fefe->primary_tag();

        if ($pt eq "CDS" or $pt eq "rRNA" or $pt eq "tRNA") {
            $featc++;
            last if $featc==5;
            push @two, $fefe;

            my $teststart=$two[0]->start();
            my $endi = $two[0]->end();
            my $starti = $two[1]->start();
            my $testend=$two[1]->end();
            my $end = $endi +1;
            my $start = $starti -1;
            my $subseq = $obj->subseq($end,$start) unless ($end>=$start);

            dd ($gbfffile,$pt,
                "\$two\[0\]:",$two[0]->primary_tag(),"teststart:$teststart","endi:$endi","\$two\[1\]:",$two[1]->primary_tag(),"starti:$starti","testend:$testend","after:","end:$end","start:$start",$subseq);

            shift @two;       
        }
    }   
}

perl genome sequence • 1.7k views

ADD COMMENT • link updated 24 months ago by Ram 44k • written 9.0 years ago by fr ▴ 210