Question: Adding features to genbank file based on feature table
0
gravatar for ramiroricardo
10 months ago by
ramiroricardo0 wrote:

I am trying to annotate a genome for which I have a close reference. I have done annotation using DFAST and ended up with a genbank file like the one below. As you will see the first CDS has been annotated as a "hypothetical protein" and lacks a /gene name, whereas the second CDS has been annotated as "putative mobilization protein" and a /gene name has been given (BT_4758). I would like to have gene names for these "hypothetical proteins" as these make ~1/3 of the genome and I know some of these match at 100% percentage id with the reference. Thus, I used blastp to blast all proteins in my new genome against the reference and created a feature table like the one below.

For each /locus_tag in the genbank file I would like to first check if the /locus_tag already has a corresponding /gene. If /gene is present, do nothing. If false, find the corresponding gene name in the feature table and add it to the genbank file after the corresponding /locus_tag. I have been trying to find ways to do this but with limited success. Any pointers would be great.

Genbank

     CDS             3205047..3205778
                 /product="hypothetical protein"
                 /inference="COORDINATES:ab initio
                 prediction:Prodigal:2.6.3"
                 /inference="similar to AA sequence:RefSeq:WP_011109414.1"
                 /transl_table=11
                 /codon_start=1
                 /translation="MTKIFGIYPTDRQESITFLNRINTYLCRKLDNQWHCYKIKYSNAD
                 HESCIKKAIDSNAKFILFMGHGRSDCLFGSCNKKSQDFIAEDAVIENPEFYRNEHFIHS
                 DNISKFKGKIFFSLSCLSNRNDTKSLARSAINNGVISFVGFGDIPTDYIVGKNIPLKAI
                 AIYKGIISKVIKISISISIQNNYTVEEMVSLIKVLTTKEIQKIILSPYKNRHKEIIVKN
                 LFLFKQEIMIFGNRYERLLYE"
                 /locus_tag="LOCUS_23770"
                 /note="WP_011109414.1 hypothetical protein (Bacteroides
                 thetaiotaomicron VPI-5482) [pid:95.1%, q_cov:100.0%,
                 s_cov:100.0%, Eval:1.2e-130]"
                 /note="OrthoSearch:AAO79862.1 hypothetical protein
                 (Bacteroides thetaiotaomicron VPI-5482) [pid:95.1%,
                 q_cov:100.0%, s_cov:100.0%, Eval:2.6e-132, RBH]"
                 /note="Prodigal_2381"
 CDS             complement(3205909..3207444)
                 /product="putative mobilization protein"
                 /inference="COORDINATES:ab initio
                 prediction:Prodigal:2.6.3"
                 /inference="similar to AA sequence:INSD:AAO79863.1"
                 /transl_table=11
                 /codon_start=1
                 /translation="MQETRLMENEYSINLPTRFWYRKKEWKGWINVVNPFRASMILGTP
                 GSGKSYAVVNNYIKQAIEKSYALYIYDFKFDDLSVIAYNHLIKYRHRYKIPPKFYVINF
                 DNPRKSHRCNPLAPELMTDISDAYESSYTIMLNLNKSWVQKQGDFFVESPIVLFTAIIW
                 FLKIYEGGKYCTFPHAIELLNKRYEDVFTILTSYPDLENYLSPFIDAWKGGASEQLQGQ
                 IASAKIPLSRLISPQLYWVMSGSDFTLDINNPKEPKVLCVGNNPDRISIYGAALGLYNS
                 RIVKLINKKKQLKSCVIIDELPTIFFKGLDNLIATARSNKVAVVLGFQDFSQLKRDYGD
                 KEAAVIMSTVGNVFSGQVVGETAKTLSERFGKILQKRESMSINRNDTSTSISTQLDSLI
                 PASKISTLSQGMFVGAVTDNFGETIDQKVFHAQIVVDNDAVQKETTSYQPIPEISSFLD
                 ENGNDTMEQQIQANYQQIKQDIVELVENELIRIENDPELKHLLGGDEGARAQA"
                 /locus_tag="LOCUS_23780"
                 /gene="BT_4758"
                 /note="OrthoSearch:AAO79863.1 putative mobilization protein
                 (Bacteroides thetaiotaomicron VPI-5482) [pid:99.6%,
                 q_cov:100.0%, s_cov:100.0%, Eval:7.0e-299, RBH]"
                 /note="COG:COG3505:VirD4 Type IV secretory pathway, VirD4
                 component, TraG/TraD family ATPase  [Category:U,
                 Aligned:90-394, Eval:5.6e-11, score:62.0, N-term missing]"
                 /note="Prodigal_2382"

Feature table (query is my new assembly; subject is the reference genome)

Query ID       Subject ID
LOCUS_00010      BT_4578
LOCUS_00020      BT_4577
LOCUS_00030      BT_2429
genbank annotation genome • 227 views
ADD COMMENTlink written 10 months ago by ramiroricardo0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2150 users visited in the last hour
_