How to convert gbk file to roary tool acceptable gff3 format?
0
0
Entering edit mode
7 months ago
Kumar ▴ 50

I would like to carryout pan-genome analysis using roary, so I need to convert gbk files to roary compatible gff/gff3 files (must have fasta sequence at the end of each gff files) example file for roary gff format. I am aware that prokka can generate roary compatible gff3 file format, but the problem is I have large number of datasets downloaded from ncbi. Therefore, I would like to proceed pan-genome analysis with ncbi annotations. Could anyone please suggest me any tool to do the same. Thanks in advance.

python perl linux bioinformatics bash • 498 views
0
Entering edit mode

Roary homepage says to use a perl script...

On NCBI's website, GFF3 files only contain annotation and not the nucleotide sequence so cannot be used. You need to download the GenBank files plus nucleotide sequence and convert them. When downloading, click on the show sequence option, Update View then Send to a File of type GenBank. You can then use the Bio::Perl script bp_genbank2gff3.pl to convert to GFF3. Just be aware that mixing different gene prediction methods and annotation pipelines can give noisier results. Alternatively, you can use ncbi-genome-download to pull down the FASTA files and convert them to GFF3 with Prokka.

I prefer the prokka way...

0
Entering edit mode

See if this helps...