Question: How to skip a genebank file that has no dna sequence?
1
gravatar for rf
4.1 years ago by
rf100
rf100 wrote:

Hi!

I'm parsing a series of gbff taken from NCBI. However, some of them don't have the DNA sequence inside (e.g.: see the .gbff.gz in here ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF_000149425.1_ASM14942v1)

What I want to do is to skip these files, but I dont know what to call on Bio::SeqIO. Could anyone just tell me what should I call?

A snippet of my code is:

 

my $in=Bio::SeqIO->new(-file=>"$mygbkff",-format=>'genbank',-alphabet=>"dna");

while (my $seq=$in->next_seq){
    for my $feat ($seq -> get_SeqFeatures){
        next unless (($feat->primary_tag eq "CDS") and (defined $feat->spliced_seq->seq)); #this was my attempt at finding only cds that had sequences annotated
        print "\>";
        print $feat->spliced_seq->seq,"\n";
    }
}
genebank genome perl • 975 views
ADD COMMENTlink modified 4.1 years ago • written 4.1 years ago by rf100
1

You found the right solution but I would recommend adding that as an answer so it can be marked as solved. Just for reference, the parentheses aren't needed and you may want to check if the seq is defined (flow control) rather than not being defined. For example, next unless defined $seq->seq or if (defined $seq->seq) { .... } are perhaps more expressive.

ADD REPLYlink written 4.1 years ago by SES8.1k

Thanks for the suggestion, I used it in the answer below. And thanks for the suggestion of the parentheses, it is cleaner

ADD REPLYlink written 4.1 years ago by rf100
2
gravatar for rf
4.1 years ago by
rf100
rf100 wrote:

Adding this as an answer as requested.

The solution was, rather than verifying if $feat is defined, to verify if $seq was defined:

 

my $in=Bio::SeqIO->new(-file=>"$mygbkff",-format=>'genbank',-alphabet=>"dna");

while (my $seq=$in->next_seq){
    next unless defined $seq->seq;#This ignores those sequences whose DNA was missing
    for my $feat ($seq -> get_SeqFeatures){
        next unless (($feat->primary_tag eq "CDS") and (defined $feat->spliced_seq->seq)); #this was my attempt at finding only cds that had sequences annotated
        print "\>";
        print $feat->spliced_seq->seq,"\n";
    }
}
ADD COMMENTlink written 4.1 years ago by rf100
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2481 users visited in the last hour