How Can I Get Sequences From A Gff3 File?
1
3
Entering edit mode
13.6 years ago
David B ▴ 70

As you know, GFF3 files can contain FASTA sequences after the feature table.

How do I extract a specific FASTA sequence given it's ID?

my $gffio = Bio::Tools::GFF->new(
    -file =>
        "/path/to/file.gff",
    -gff_version => 3
);

print Dumper $gffio->get_seqs();

but it's seems null, although the GFF3 has sequences and is also valid. I am abke to parse the features themselves (using $gffio->next_feature()), but not the sequences ($gffio->get_seqs());

gff sequence bioperl • 4.5k views
ADD COMMENT
3
Entering edit mode
13.6 years ago
brentp 24k

For some (undocumented) reason, the bioperl gff parser wants you to iterate through all the features before getting the sequences. Of course it's likely this limitation is because the sequences appear at the end, but it's not explained to be the case as far as I could see.

I was able to get your example to work by adding:

while($gffio->next_feature()){ }

to the line before the print.

ADD COMMENT
0
Entering edit mode

Thanks, this seems to work, but the other option to get sequences - $gffio->seq_id_by_h() - (documented here) does not... very strange.

ADD REPLY

Login before adding your answer.

Traffic: 2636 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6