adding fields to a genbank (translation)
1
0
Entering edit mode
8.2 years ago
Joe 21k

I have a genbank that I got from someone that I'm doing some analysis for, and somewhere along the line it was either dodgy to start with, or has been borked.

The gbk is the correct format, and has all information, except the /translations are missing, like so:

LOCUS       Sakai_contig000001    4952793 bp    DNA     linear   UNC 05-JAN-2016
DEFINITION  [gcode=11] [organism=Escherichia coli] [strain=Sakai].
FEATURES             Location/Qualifiers
     CDS             concatenate_genome:85..6084
                     /inference="ab initio prediction:Prodigal:2.60,protein
                     motif:CLUSTERS:PRK09751"
                     /locus_tag="PROKKA_00001"
                     /product="putative ATP-dependent helicase Lhr"
     CDS             concatenate_genome:6081..8195
                     /EC_number="3.6.4.12"
                     /gene="pcrA"
                     /inference="ab initio prediction:Prodigal:2.60,similar to
                     AA sequence:UniProtKB:P64319"
                     /locus_tag="PROKKA_00002"
                     /product="ATP-dependent DNA helicase PcrA"
     CDS             complement(concatenate_genome:9148..9393)
                     /inference="ab initio prediction:Prodigal:2.60"
                     /locus_tag="PROKKA_00003"
                     /product="hypothetical protein"

Given that I still have the locus-tags, and the co-ordinates for the each CDS in the file, as well as most header information such as the inferences etc. Does anyone know of a way I can read this in to a program or script (So far I've fiddled with CLC and Artemis but without any luck), such that it puts the CDS's in the correct positions and I can then write a new GBK which will take this information and give me the translations as well.

It's important that whatever method doesn't alter the locus tags in any way else it will screw up some RNAseq analysis I've done prior to discovering this issue.

genbank • 2.8k views
ADD COMMENT
2
Entering edit mode
5.4 years ago

I just used this script to add translations to a genbank file and it seems to work perfectly: github.com/thackl/seq-scripts/blob/master/bin/gb-add-trans

It is straightforward to use:

./gb-add-trans genbank_without_translation.gb >genbank_with_translation.gb
ADD COMMENT
0
Entering edit mode

Nice find!

Funnily enough I already follow him on github and never saw this code!

ADD REPLY

Login before adding your answer.

Traffic: 1890 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6