Getting intergenic sequences using BioPerl?
0
0
Entering edit mode
8.8 years ago
fr ▴ 210

Hello! I am trying to parse many genbank files using BioPerl in order to get intergenic sequences. I am trying to adapt the answer at Perl script: generate pseudo-CDSs from intergenic region. However, I can't get to solve the errors I commented there. Namely:

  1. I always get the first term to retrieve the feature "source". When I specify to ignore it (e.g. next if $two[0]->primary_tag eq "source" or shift @two if $two[0]->primary_tag eq "source"), all reads are ignored.
  2. Even in cases where an actual sequence should be retrieved, $subseq is still undef and I don't know why.

All help is appreciated. If you know somewhere where I can find something similar or point me out what's wrong, I really appreciate it!

Here is my code as of now:

use strict;
use warnings;
use autodie;
use Data::Dump;
use Data::Dumper;
use Bio::SeqIO;
STDOUT->autoflush(1);

my $gbffpath='C:\My gbff path';

opendir(DIR, "$gbffpath");
my @files= grep {/\.gbff$/ } readdir(DIR);
closedir(DIR);
my $featc=0;
foreach my $gbfffile (@files){
    my $in = Bio::SeqIO->new(-file=>"<$gbffpath\\$gbfffile", -format=>"genbank");
    my $obj = $in->next_seq();
    my @features = $obj->get_SeqFeatures();
    my @two = shift @features;

    foreach my $fefe  (@features){
        my $pt = $fefe->primary_tag();

        if ($pt eq "CDS" or $pt eq "rRNA" or $pt eq "tRNA") {
            $featc++;
            last if $featc==5;
            push @two, $fefe;

            my $teststart=$two[0]->start();
            my $endi = $two[0]->end();
            my $starti = $two[1]->start();
            my $testend=$two[1]->end();
            my $end = $endi +1;
            my $start = $starti -1;
            my $subseq = $obj->subseq($end,$start) unless ($end>=$start);

            dd ($gbfffile,$pt,
                "\$two\[0\]:",$two[0]->primary_tag(),"teststart:$teststart","endi:$endi","\$two\[1\]:",$two[1]->primary_tag(),"starti:$starti","testend:$testend","after:","end:$end","start:$start",$subseq);

            shift @two;       
        }
    }   
}
perl genome sequence • 1.7k views
ADD COMMENT

Login before adding your answer.

Traffic: 1309 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6