How do I create a gff(3) file from a genebank file?
Entering edit mode
5 months ago


Background I'm trying to analyse a RNA-seq experiment of Bacillus subtilis PY79. As part of that, I need to create an ensemble database using ensembldb ( For this I need a gff file of the genome. I tried to download it from NCBI, however, I get an error because that gff file lacks "gene_id". Since I cannot find any other gff file of that subspecies, I am now trying to generate it from the gb file.

The Problem I have a genebank (gb) file which I have downloaded from NCBI ( then send to -> file -> GeneBank (full)). I wish to convert it to a gff3 file. I have attempted several things, but no succeeded.

What I've Tried

gff genebank • 387 views
Entering edit mode

The GFF file for this strain does have gene identifier. You should be able to use that for your counting using featureCounts. This is bacterial RNAseq so things are simpler. Align with aligner of your choice and then use featureCounts with -g gene option. If you choose this file then be sure to get the corresponding genome fasta file to create your indexes. That way all identifiers will match.

Entering edit mode
5 months ago

This usecase is one of the many for which I wrote the bio package. Get the file:

bio fetch NC_022898

convert the Genbank file to GFF like so:

bio convert  NC_022898 --gff > annotations.gff

See more here:

note how much nicer and prettier the gene models made with bio are (plus are fully compatible with featureCounts):

Disclaimer: the package is still under heavy development and has not been sufficiently tested


Login before adding your answer.

Traffic: 2021 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6