To Obtain Accession Id From Genbank .Gbf Files
1
3
Entering edit mode
11.2 years ago
rosarylimyt ▴ 70

Can anyone please help me with handling Genbank .gbf files? Recently I've generated Sequin (.sqn) files and Genbank (.gbf) files which I don't know what to do with them to obtain the accession IDs of the translated nucleotide sequences such that I know the names of the proteins identified. the .gbf files look something like this when I open with Notepad:

LOCUS       Scaffold1            1325603 bp    DNA     linear       14-FEB-2013
DEFINITION  No definition line found.
ACCESSION   
VERSION
KEYWORDS    .
SOURCE      Unknown.
  ORGANISM  Unknown.
            Unclassified.
FEATURES             Location/Qualifiers
     source          1..1325603
                     /organism="unknown"
                     /mol_type="genomic DNA"
     gene            complement(<1..555)
                     /locus_tag="asmbl_1"
     CDS             complement(<1..555)
                     /locus_tag="asmbl_1"
                     /codon_start=1
                     /transl_table=11
                     /product="tat (twin-arginine translocation) pathway signal
                     sequence domain protein"
                     /translation="MKEFHSTLSRRDFMKSLGVVGAGLGTMSAAAPVFHDLDEVTSST
                     LGINKNPWWVKERDFKNPTVPIDWSKVTRQPGVFQGLPRPTVADFTKAGVVGGTSTDL
                     ETPEMALTLYDAMAKEFPGWTPGYAGMGDTRTTALCNASKFMMFGAWPGNMEMGGKRV
                     NVIGAIMAAGGSPTFTPWLGPQLDT"
...
...

Does anyone here know of a software tool which I can use to make sense out of these and generate accession IDs for them?

Thank you in advance!

id genbank annotation protein • 4.4k views
ADD COMMENT
0
Entering edit mode

Not clear: this is a new .gbf, generated by you for some new sequence data? In which case there will not be any accessions or IDs; that happens after submission to GenBank and curation.

ADD REPLY
1
Entering edit mode
11.2 years ago

As Neilfws points out if this is your genbank file then there won't be any accession numbers, check your file for fields such as db_xref (see below):

 gene            1..626
                 /gene="Hbb-b1"
                 /gene_synonym="AA409645; beta1; HBB1; Hbbt1; Hbbt2"
                 /note="hemoglobin, beta adult major chain"
                 /db_xref="GeneID:15129"
                 /db_xref="MGI:96021"

If you do have those you can extract them in various ways, but before we get there let's make sure you have them in the first place.

ADD COMMENT
0
Entering edit mode

No, I do not have the db_xref thing in the file. This .gbf file I got was generated from using CloVR-Search (annotation). Could you pls advise on how should I go about obtaining accession ID from it, apart from submitting to Genbank? Is there a program I can use to parse it such that I can isolate the 'translation=....' parts only? Thank you so much!

ADD REPLY
1
Entering edit mode

You will need to parse the file with a programming language, for example BioPython. See this section on parsing GenBank files: http://biopython.org/DIST/docs/tutorial/Tutorial.html#htoc36

ADD REPLY
0
Entering edit mode

Hi, I tried parsing one of the .gbf translated nucleotide scaffold with BioPython as advised via:

from Bio import SeqIO record=SeqIO.read('./rosary/dataset/clovr/Scaffold6.gbf','genbank'); record SeqRecord(seq=Seq('ATGGTGGGCCATCTTGGTCTCGAACCAAGGACCTCAGTCTTATCAGCTCCAACG...TGG', IUPACAmbiguousDNA()), id='', name='Scaffold6', description='No definition line found.', dbxrefs=[])

but it still wouldn't provide me with any sort of identification for that translated scaffold =[

ADD REPLY

Login before adding your answer.

Traffic: 1490 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6