Entering edit mode
9.5 years ago
fr
▴
220
Hello! I am trying to parse many genbank files using BioPerl in order to get intergenic sequences. I am trying to adapt the answer at Perl script: generate pseudo-CDSs from intergenic region. However, I can't get to solve the errors I commented there. Namely:
- I always get the first term to retrieve the feature "source". When I specify to ignore it (e.g.
next if $two[0]->primary_tag eq "source"
orshift @two if $two[0]->primary_tag eq "source"
), all reads are ignored. - Even in cases where an actual sequence should be retrieved,
$subseq
is stillundef
and I don't know why.
All help is appreciated. If you know somewhere where I can find something similar or point me out what's wrong, I really appreciate it!
Here is my code as of now:
use strict;
use warnings;
use autodie;
use Data::Dump;
use Data::Dumper;
use Bio::SeqIO;
STDOUT->autoflush(1);
my $gbffpath='C:\My gbff path';
opendir(DIR, "$gbffpath");
my @files= grep {/\.gbff$/ } readdir(DIR);
closedir(DIR);
my $featc=0;
foreach my $gbfffile (@files){
my $in = Bio::SeqIO->new(-file=>"<$gbffpath\\$gbfffile", -format=>"genbank");
my $obj = $in->next_seq();
my @features = $obj->get_SeqFeatures();
my @two = shift @features;
foreach my $fefe (@features){
my $pt = $fefe->primary_tag();
if ($pt eq "CDS" or $pt eq "rRNA" or $pt eq "tRNA") {
$featc++;
last if $featc==5;
push @two, $fefe;
my $teststart=$two[0]->start();
my $endi = $two[0]->end();
my $starti = $two[1]->start();
my $testend=$two[1]->end();
my $end = $endi +1;
my $start = $starti -1;
my $subseq = $obj->subseq($end,$start) unless ($end>=$start);
dd ($gbfffile,$pt,
"\$two\[0\]:",$two[0]->primary_tag(),"teststart:$teststart","endi:$endi","\$two\[1\]:",$two[1]->primary_tag(),"starti:$starti","testend:$testend","after:","end:$end","start:$start",$subseq);
shift @two;
}
}
}