Getting intergenic sequences using BioPerl?
Entering edit mode
8.8 years ago
fr ▴ 210

Hello! I am trying to parse many genbank files using BioPerl in order to get intergenic sequences. I am trying to adapt the answer at Perl script: generate pseudo-CDSs from intergenic region. However, I can't get to solve the errors I commented there. Namely:

  1. I always get the first term to retrieve the feature "source". When I specify to ignore it (e.g. next if $two[0]->primary_tag eq "source" or shift @two if $two[0]->primary_tag eq "source"), all reads are ignored.
  2. Even in cases where an actual sequence should be retrieved, $subseq is still undef and I don't know why.

All help is appreciated. If you know somewhere where I can find something similar or point me out what's wrong, I really appreciate it!

Here is my code as of now:

use strict;
use warnings;
use autodie;
use Data::Dump;
use Data::Dumper;
use Bio::SeqIO;

my $gbffpath='C:\My gbff path';

opendir(DIR, "$gbffpath");
my @files= grep {/\.gbff$/ } readdir(DIR);
my $featc=0;
foreach my $gbfffile (@files){
    my $in = Bio::SeqIO->new(-file=>"<$gbffpath\\$gbfffile", -format=>"genbank");
    my $obj = $in->next_seq();
    my @features = $obj->get_SeqFeatures();
    my @two = shift @features;

    foreach my $fefe  (@features){
        my $pt = $fefe->primary_tag();

        if ($pt eq "CDS" or $pt eq "rRNA" or $pt eq "tRNA") {
            last if $featc==5;
            push @two, $fefe;

            my $teststart=$two[0]->start();
            my $endi = $two[0]->end();
            my $starti = $two[1]->start();
            my $testend=$two[1]->end();
            my $end = $endi +1;
            my $start = $starti -1;
            my $subseq = $obj->subseq($end,$start) unless ($end>=$start);

            dd ($gbfffile,$pt,

            shift @two;       
perl genome sequence • 1.7k views

Login before adding your answer.

Traffic: 1309 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6