How To Store And View Nucleotide Data From Genbank Format In Hdf5 Format
1
0
Entering edit mode
10.4 years ago
chaitu ▴ 10

example

:

LOCUS       X56730                  1560 bp    DNA     linear   PLN 30-JUN-2006
DEFINITION  Yeast gene for proteasome Y13 subunit.
ACCESSION   X56730
VERSION     X56730.1  GI:506479
KEYWORDS    proteasome.
SOURCE      Saccharomyces cerevisiae (baker's yeast)
  ORGANISM  Saccharomyces cerevisiae
            Eukaryota; Fungi; Dikarya; Ascomycota; Saccharomycotina;
            Saccharomycetes; Saccharomycetales; Saccharomycetaceae;
            Saccharomyces.
REFERENCE   1  (bases 1 to 1560)
  AUTHORS   Emori,Y., Tsukahara,T., Kawasaki,H., Ishiura,S., Sugita,H. and
            Suzuki,K.
  TITLE     Molecular cloning and functional analysis of three subunits of
            yeast proteasome
  JOURNAL   Mol. Cell. Biol. 11 (1), 344-353 (1991)
   PUBMED   1898763
REFERENCE   2
  AUTHORS   Emori,Y., Tsukahara,T., Kawasaki,H., Ishiura,S., Sugita,H. and
            Suzuki,K.
  TITLE     Molecular cloning and functional analysis of three subunits of
            yeast proteasome
  JOURNAL   Unpublished
REFERENCE   3  (bases 1 to 1560)
  AUTHORS   Emori,Y.
  TITLE     Direct Submission
  JOURNAL   Submitted (12-NOV-1990) Y. Emori, DEPT OF BIOPHYSICS &
            BIOCHEMISTRY, FACULTY OF SCIENCE, UNIVERSITY OF TOKYO, 7-3-1 HONGO,
            BUNKYO-KU, TOKYO 113, JAPAN
FEATURES             Location/Qualifiers
     source          1..1560
                     /organism="Saccharomyces cerevisiae"
                     /mol_type="genomic DNA"
                     /strain="X2180-1A"
                     /db_xref="taxon:4932"
     TATA_signal     306..309
     TATA_signal     353..356
     CDS             373..1149
                     /codon_start=1
                     /product="proteasome Y13 subunit"
                     /protein_id="CAA40054.1"
                     /db_xref="GI:506480"
                     /db_xref="GOA:P23638"
                     /db_xref="InterPro:IPR000426"
                     /db_xref="InterPro:IPR001353"
                     /db_xref="InterPro:IPR016050"
                     /db_xref="PDB:1FNT"
                     /db_xref="PDB:1G0U"
                     /db_xref="PDB:1G65"
                     /db_xref="PDB:1JD2"
                     /db_xref="PDB:1RYP"
                     /db_xref="PDB:1VSY"
                     /db_xref="PDB:1Z7Q"
                     /db_xref="PDB:2F16"
                     /db_xref="PDB:2FAK"
                     /db_xref="PDB:2GPL"
                     /db_xref="PDB:2ZCY"
                     /db_xref="PDB:3BDM"
                     /db_xref="PDB:3D29"
                     /db_xref="PDB:3DY3"
                     /db_xref="PDB:3DY4"
                     /db_xref="PDB:3E47"
                     /db_xref="PDB:3GPJ"
                     /db_xref="PDB:3GPT"
                     /db_xref="PDB:3GPW"
                     /db_xref="PDB:3HYE"
                     /db_xref="PDB:3L5Q"
                     /db_xref="SGD:S000003367"
                     /db_xref="UniProtKB/Swiss-Prot:P23638"
                     /translation="MGSRRYDSRTTIFSPEGRLYQVEYALESISHAGTAIGIMASDGI
                     VLAAERKVTSTLLEQDTSTEKLYKLNDKIAVAVAGLTADAEILINTARIHAQNYLKTY
                     NEDIPVEILVRRLSDIKQGYTQHGGLRPFGVSFIYAGYDDRYGYQLYTSNPSGNYTGW
                     KAISVGANTSAAQTLLQMDYKDDMKVDDAIELALKTLSKTTDSSALTYDRLEFATIRK
                     GANDGEVYQKIFKPQEIKDILVKTGITKKDEDEEADEDMK"
     polyA_signal    1395..1400
ORIGIN      
        1 ggaagaaggg tggtgttcta gcgatggtaa gattttgcca ttgcccaaag cccgataagc
       61 ctatcccact tcatgaatat ataacactcg cagagctcga tgttggagac agtgagtgag
      121 cagtgaattg ctcatgtttt ctctgcatcc tcatttaatg acaattagcc atgtaataac
      181 atcttgaggc agttaaatat tcgttaccct gcaggtggca aaaaatttat agaataaaag
      241 cataaaaaga tggatatcta tgtaataagg aaacattggc agagcgaaga gaacagactg
      301 ctttctataa aaagttttcg atcagtctct attttaataa ttgattattg gatatagtta
      361 gtagtgttaa acatgggttc cagaagatac gattccagga caacaatttt ctcccctgag
      421 ggacgtctat atcaggttga atacgcgcta gaatccattt cacatgcagg taccgcaatt
      481 gggattatgg catctgatgg gattgttctt gcagcagaac gcaaagtcac aagtacttta
      541 ctagaacaag acacctctac cgaaaaactt tataagttaa acgataaaat tgcggttgcc
      601 gttgctggac tgactgcaga tgcagaaatt ctaataaata cggctagaat tcacgctcaa
      661 aattacctta aaacctataa tgaagatata ccagtagaaa ttttggtgag aaggctaagt
      721 gatataaaac aaggttacac gcaacatggt ggtttaagac catttggtgt gtcctttatc
      781 tacgccggtt atgacgatag atacggttac caattgtata catctaatcc atcgggaaac
      841 tatacagggt ggaaggctat tagtgttggc gctaacacat cagcagcaca aaccctactt
      901 caaatggact acaaggatga tatgaaagtc gatgatgcca ttgaactggc tttaaaaacg
      961 ttatccaaaa ctaccgacag tagcgcgctg acttatgaca ggttggaatt tgctactatc
     1021 agaaagggtg ctaatgacgg agaagtgtat cagaagattt tcaagcctca agagataaag
     1081 gatatattgg taaagactgg tattaccaag aaggatgaag acgaagaagc tgatgaagat
     1141 atgaaataag gttggaaagt attgtttgac ttcatgctta tataaatatg tacgcataga
     1201 aacatatctc taaaattaaa agtgaaagaa aaaaatcgct aagattcccc tttcggggga
     1261 aggctgaaga aacttttttt tgcgcaaatt tcaagatcgg aatcgctcga gaggcaaata
     1321 taaaaaaggg cctgctcgtt tggcctactt gttgttggat tctaactcca atctaatttt
     1381 ggcagcactt caaaaataaa caaaagcgat gtctgcgtca ctagtgaatc gatcattgaa
     1441 aaatataagg aatgaattag aatttttgaa ggaatcaaac gtcatatcag gcgacatttt
     1501 cgaattaatc aatagcaagt tacctgagaa atgggatgga aaccaaagat cgccccaaaa
//

=============================================================================================================

this data should store and view in hdf5 format

• 3.3k views
ADD COMMENT
1
Entering edit mode

The example INSDC database entry is from GenBank and is shown in the GenBank format, not the EMBL format used in EMBL-Bank. Compare the representations of the entry from the INSDC member databases:

ADD REPLY
0
Entering edit mode

thanks for the clarification, always mix these up

ADD REPLY
0
Entering edit mode

what's the advantage of storing this in a HDF5 file ? and do you just want to store this as a big string or do you want to store a structured document ?

ADD REPLY
0
Entering edit mode

I want to store in a structured document like xml

ADD REPLY
0
Entering edit mode

why not using XML ? or a XML-based database like eXist ?

ADD REPLY
2
Entering edit mode
10.4 years ago
Michael 54k

After formatting, you can see that this Genbank formatted data. You generally can store strings in HDF and you possibly could make a hierarchical data type to store all fields, however HDF5 is best used when the data can be formatted as a multidimensional large array where random access is required to known coordinates in constant time. For this structured data type however you would not have any advantage of storing it in HDF. You will most likely need searches for identifiers or even full text searches in fields. For this purpose, database indices are a much better solution, and HDF is a poor solution because you would have to search through all fields for a search, because HDF afaik doesn't support indexing like eg. MySQL or Postgres do.

In conclusion, there will be no advantage of storing the data in HDF and the appropriate storage is either a RDBMS or an XML-database.

ADD COMMENT
0
Entering edit mode

thanks for explaining but the thing is I just want to practice the code how to convert that file in to HDF5 for my understanding I have plans for later

ADD REPLY
1
Entering edit mode
ADD REPLY
0
Entering edit mode

thanks a lot..........

ADD REPLY

Login before adding your answer.

Traffic: 1779 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6