Question: Converting Gbff To Gff3
3
gravatar for Farhat
9.0 years ago by
Farhat2.9k
Pune, India
Farhat2.9k wrote:

I am trying to convert gbff formatted annotation file to gff3 to be able to use it with gbrowse. I did find a tool for doing this but it resulting in a gff file over 250GB and growing at which point I killed it. How can I go about this conversion?

gff format conversion • 8.6k views
ADD COMMENTlink written 9.0 years ago by Farhat2.9k
1
gravatar for Bach
9.0 years ago by
Bach550
Bach550 wrote:

You have not specified what is in your gbff nor with which tool you attempted the conversion (though for the later I suspect one of the bioperl ones), so my answer has to be generic and may or may not apply to your case.

Strictly speaking, gbff is "multiple gbk/gbf (GenBank) files concatenated". As GBrowse supports multiple databases, one solution might be to subdivide your gbff file into logical parts and create for each one an own, smaller, gbff or gbk file. Then process each one into a separate database. Beware: the GBrowse database files need space, too, and in my environment that's between 2 and 3 times the amount taken by the GFF3.

ADD COMMENTlink written 9.0 years ago by Bach550

I am using bp_genbank2gff3.pl for the conversion. How does one go about subdividing the gbff file? What I was wondering was how does a ~1GB gbff file turn into a >250GB gff file. Is there a lot of redundant information in the second format or am I missing out some parameter in the conversion.

ADD REPLYlink written 9.0 years ago by Farhat2.9k
1
gravatar for Scott Cain
9.0 years ago by
Scott Cain750
Scott Cain750 wrote:

Did you compare the input and the output for the initial records? I'm curious what they looked like. Typically, I would expect a GFF file to be smaller than a GenBank file, since the GenBank file is more verbose. That said, the genbank2gff3 script is far from perfect, and it's output often needs tweeked to be valid GFF3. This is not the fault of the authors, as GenBank files vary so widely in their content that the converter can't anticipate what every author of GenBank files will do.

Finally, I would suggest that a better place to ask this question is the BioPerl mailing list.

with sample data that reproduces the strange behavior (whatever it is).

ADD COMMENTlink modified 9 months ago by RamRS27k • written 9.0 years ago by Scott Cain750
0
gravatar for Ravi
8.9 years ago by
Ravi0
Ravi0 wrote:

i tried to create a gff3 file from .gbk file using bp_genbank2gff3.pl but what i get is same features repeating many times.. and the file keeps growing in size until my harddisk gets full.. i have tried to filter all other features except "region" but still it repeats a single entry many times.. i have attached a part of the file generated.. pls kindly help me.

ADD COMMENTlink written 8.9 years ago by Ravi0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1766 users visited in the last hour