Question

adding fields to a genbank (translation)

0

Entering edit mode

8.2 years ago

Joe 21k

I have a genbank that I got from someone that I'm doing some analysis for, and somewhere along the line it was either dodgy to start with, or has been borked.

The gbk is the correct format, and has all information, except the /translations are missing, like so:

LOCUS       Sakai_contig000001    4952793 bp    DNA     linear   UNC 05-JAN-2016
DEFINITION  [gcode=11] [organism=Escherichia coli] [strain=Sakai].
FEATURES             Location/Qualifiers
     CDS             concatenate_genome:85..6084
                     /inference="ab initio prediction:Prodigal:2.60,protein
                     motif:CLUSTERS:PRK09751"
                     /locus_tag="PROKKA_00001"
                     /product="putative ATP-dependent helicase Lhr"
     CDS             concatenate_genome:6081..8195
                     /EC_number="3.6.4.12"
                     /gene="pcrA"
                     /inference="ab initio prediction:Prodigal:2.60,similar to
                     AA sequence:UniProtKB:P64319"
                     /locus_tag="PROKKA_00002"
                     /product="ATP-dependent DNA helicase PcrA"
     CDS             complement(concatenate_genome:9148..9393)
                     /inference="ab initio prediction:Prodigal:2.60"
                     /locus_tag="PROKKA_00003"
                     /product="hypothetical protein"

Given that I still have the locus-tags, and the co-ordinates for the each CDS in the file, as well as most header information such as the inferences etc. Does anyone know of a way I can read this in to a program or script (So far I've fiddled with CLC and Artemis but without any luck), such that it puts the CDS's in the correct positions and I can then write a new GBK which will take this information and give me the translations as well.

It's important that whatever method doesn't alter the locus tags in any way else it will screw up some RNAseq analysis I've done prior to discovering this issue.

genbank • 2.8k views

ADD COMMENT • link updated 21 months ago by Ram 43k • written 8.2 years ago by Joe 21k

score 2 · Accepted Answer · 2018-11-26

2

Entering edit mode

5.4 years ago

markus.ankenbrand ▴ 20

I just used this script to add translations to a genbank file and it seems to work perfectly: github.com/thackl/seq-scripts/blob/master/bin/gb-add-trans

It is straightforward to use:

./gb-add-trans genbank_without_translation.gb >genbank_with_translation.gb