Question: adding fields to a genbank (translation)
0
gravatar for Joe
4.8 years ago by
Joe18k
United Kingdom
Joe18k wrote:

I have a genbank that I got from someone that I'm doing some analysis for, and somewhere along the line it was either dodgy to start with, or has been borked.

 

The gbk is the correct format, and has all information, except the /translations are missing, like so:

 

LOCUS       Sakai_contig000001    4952793 bp    DNA     linear   UNC 05-JAN-2016
DEFINITION  [gcode=11] [organism=Escherichia coli] [strain=Sakai].
FEATURES             Location/Qualifiers
     CDS             concatenate_genome:85..6084
                     /inference="ab initio prediction:Prodigal:2.60,protein
                     motif:CLUSTERS:PRK09751"
                     /locus_tag="PROKKA_00001"
                     /product="putative ATP-dependent helicase Lhr"
     CDS             concatenate_genome:6081..8195
                     /EC_number="3.6.4.12"
                     /gene="pcrA"
                     /inference="ab initio prediction:Prodigal:2.60,similar to
                     AA sequence:UniProtKB:P64319"
                     /locus_tag="PROKKA_00002"
                     /product="ATP-dependent DNA helicase PcrA"
     CDS             complement(concatenate_genome:9148..9393)
                     /inference="ab initio prediction:Prodigal:2.60"
                     /locus_tag="PROKKA_00003"
                     /product="hypothetical protein"

 

Given that I still have the locus-tags, and the co-ordinates for the each CDS in the file, as well as most header information such as the inferences etc. Does anyone know of a way I can read this in to a program or script (So far I've fiddled with CLC and Artemis but without any luck), such that it puts the CDS's in the correct positions and I can then write a new GBK which will take this information and give me the translations as well.

It's important that whatever method doesn't alter the locus tags in any way else it will screw up some RNAseq analysis I've done prior to discovering this issue.

genbank • 1.8k views
ADD COMMENTlink modified 2.0 years ago by markus.ankenbrand20 • written 4.8 years ago by Joe18k
2
gravatar for markus.ankenbrand
2.0 years ago by
markus.ankenbrand20 wrote:

I just used this script to add translations to a genbank file and it seems to work perfectly: github.com/thackl/seq-scripts/blob/master/bin/gb-add-trans

It is straightforward to use:

./gb-add-trans genbank_without_translation.gb >genbank_with_translation.gb
ADD COMMENTlink written 2.0 years ago by markus.ankenbrand20

Nice find!

Funnily enough I already follow him on github and never saw this code!

ADD REPLYlink written 2.0 years ago by Joe18k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1497 users visited in the last hour