Emboss seqret - problem conversion gff+fasta to EMBL
2
4
Entering edit mode
7.3 years ago
Juke34 7.4k

Hi everyone,

I try to use the seqret tool from Emboss but I'm experiencing some difficulties.

I would like to create am EMBL formatted file from a gff3 file and a fasta file.

I'm using the following command:

seqret -sequence genome.fasta -feature -fformat gff -fopenfile annotation.gff -osformat embl


My fasta file contains several sequences.

The problem is, the tool writes the gff3 features but as many time as there is a sequence in the fasta file (before each sequence).

Does someone has already experienced that and knows a way to avoid the problem?

Or any idea about another tool to do that conversion?

Thank you

emboss genome sequence software-error • 6.0k views
4
Entering edit mode
5.8 years ago
Juke34 7.4k

After lot of time spent on that, I concluded that no tool was working properly nowadays for that purpose (GFF3 to EMBL). Actually in my group we were not the only one that faced up this problem... Indeed it has been released recently such kind of converter for the Prokka gff3 output: https://github.com/sanger-pathogens/gff3toembl In our side we also developed our own tool, but we implemented something more generalized that could be apply to any kind of gff3. We hope to release it publicly in the next few weeks.

0
Entering edit mode

here is the tool we developed: https://github.com/NBISweden/EMBLmyGFF3

It works for any type of gff3 annotation.

0
Entering edit mode

Thank you for the tool!

2
Entering edit mode
7.3 years ago
Juke34 7.4k

Here they also propose an easy way to do the conversion using Bioperl: http://ratt.sourceforge.net/transform.html

Now my problem changed... I have an issue with the Locus name. Bioperl says:

--------------------- WARNING ---------------------
MSG: Bad LOCUS name? Changing [NODE_57_length_618_cov_40.4969_ID_247618] to 'unknown' and length to NODE_57_length_618_cov_40.4969_ID_247618


Any suggestion about what kind of locus name is expected to avoid to have it replaced by "unknown"?

0
Entering edit mode

OK now I found information about LOCUS information expected here: Locus Field Format On Genbank