How do I create a gff(3) file from a genebank file?
1
0
Entering edit mode
5 months ago

Hello

Background I'm trying to analyse a RNA-seq experiment of Bacillus subtilis PY79. As part of that, I need to create an ensemble database using ensembldb (https://bioconductor.org/packages/release/bioc/html/ensembldb.html). For this I need a gff file of the genome. I tried to download it from NCBI, however, I get an error because that gff file lacks "gene_id". Since I cannot find any other gff file of that subspecies, I am now trying to generate it from the gb file.

The Problem I have a genebank (gb) file which I have downloaded from NCBI (https://www.ncbi.nlm.nih.gov/nuccore/NC_022898.1?report=genbank then send to -> file -> GeneBank (full)). I wish to convert it to a gff3 file. I have attempted several things, but no succeeded.

What I've Tried

gff genebank • 387 views
0
Entering edit mode

The GFF file for this strain does have gene identifier. You should be able to use that for your counting using featureCounts. This is bacterial RNAseq so things are simpler. Align with aligner of your choice and then use featureCounts with -g gene option. If you choose this file then be sure to get the corresponding genome fasta file to create your indexes. That way all identifiers will match.

0
Entering edit mode
5 months ago

This usecase is one of the many for which I wrote the bio package. Get the file:

bio fetch NC_022898


convert the Genbank file to GFF like so:

bio convert  NC_022898 --gff > annotations.gff


See more here:

https://www.bioinfo.help/bio-gff.html

note how much nicer and prettier the gene models made with bio are (plus are fully compatible with featureCounts):

Disclaimer: the package is still under heavy development and has not been sufficiently tested

Traffic: 2021 users visited in the last hour
FAQ
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.