Question: To Obtain Accession Id From Genbank .Gbf Files
gravatar for rosarylimyt
6.3 years ago by
rosarylimyt70 wrote:

Can anyone please help me with handling Genbank .gbf files? Recently I've generated Sequin (.sqn) files and Genbank (.gbf) files which I don't know what to do with them to obtain the accession IDs of the translated nucleotide sequences such that I know the names of the proteins identified. the .gbf files look something like this when I open with Notepad:

LOCUS       Scaffold1            1325603 bp    DNA     linear       14-FEB-2013
DEFINITION  No definition line found.
SOURCE      Unknown.
  ORGANISM  Unknown.
FEATURES             Location/Qualifiers
     source          1..1325603
                     /mol_type="genomic DNA"
     gene            complement(<1..555)
     CDS             complement(<1..555)
                     /product="tat (twin-arginine translocation) pathway signal
                     sequence domain protein"

Does anyone here know of a software tool which I can use to make sense out of these and generate accession IDs for them?

Thank you in advance!

annotation id protein genbank • 2.8k views
ADD COMMENTlink modified 6.3 years ago by Istvan Albert ♦♦ 80k • written 6.3 years ago by rosarylimyt70

Not clear: this is a new .gbf, generated by you for some new sequence data? In which case there will not be any accessions or IDs; that happens after submission to GenBank and curation.

ADD REPLYlink written 6.3 years ago by Neilfws48k
gravatar for Istvan Albert
6.3 years ago by
Istvan Albert ♦♦ 80k
University Park, USA
Istvan Albert ♦♦ 80k wrote:

As Neilfws points out if this is your genbank file then there won't be any accession numbers, check your file for fields such as db_xref (see below):

 gene            1..626
                 /gene_synonym="AA409645; beta1; HBB1; Hbbt1; Hbbt2"
                 /note="hemoglobin, beta adult major chain"

If you do have those you can extract them in various ways, but before we get there let's make sure you have them in the first place.

ADD COMMENTlink written 6.3 years ago by Istvan Albert ♦♦ 80k

No, I do not have the db_xref thing in the file. This .gbf file I got was generated from using CloVR-Search (annotation). Could you pls advise on how should I go about obtaining accession ID from it, apart from submitting to Genbank? Is there a program I can use to parse it such that I can isolate the 'translation=....' parts only? Thank you so much!

ADD REPLYlink written 6.3 years ago by rosarylimyt70

You will need to parse the file with a programming language, for example BioPython. See this section on parsing GenBank files:

ADD REPLYlink written 6.3 years ago by Istvan Albert ♦♦ 80k

Hi, I tried parsing one of the .gbf translated nucleotide scaffold with BioPython as advised via:

from Bio import SeqIO'./rosary/dataset/clovr/Scaffold6.gbf','genbank'); record SeqRecord(seq=Seq('ATGGTGGGCCATCTTGGTCTCGAACCAAGGACCTCAGTCTTATCAGCTCCAACG...TGG', IUPACAmbiguousDNA()), id='', name='Scaffold6', description='No definition line found.', dbxrefs=[])

but it still wouldn't provide me with any sort of identification for that translated scaffold =[

ADD REPLYlink written 6.3 years ago by rosarylimyt70
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2154 users visited in the last hour