Question: How To Store And View Nucleotide Data From Genbank Format In Hdf5 Format
0
gravatar for chaitu
5.3 years ago by
chaitu10
chaitu10 wrote:

example

:

LOCUS       X56730                  1560 bp    DNA     linear   PLN 30-JUN-2006
DEFINITION  Yeast gene for proteasome Y13 subunit.
ACCESSION   X56730
VERSION     X56730.1  GI:506479
KEYWORDS    proteasome.
SOURCE      Saccharomyces cerevisiae (baker's yeast)
  ORGANISM  Saccharomyces cerevisiae
            Eukaryota; Fungi; Dikarya; Ascomycota; Saccharomycotina;
            Saccharomycetes; Saccharomycetales; Saccharomycetaceae;
            Saccharomyces.
REFERENCE   1  (bases 1 to 1560)
  AUTHORS   Emori,Y., Tsukahara,T., Kawasaki,H., Ishiura,S., Sugita,H. and
            Suzuki,K.
  TITLE     Molecular cloning and functional analysis of three subunits of
            yeast proteasome
  JOURNAL   Mol. Cell. Biol. 11 (1), 344-353 (1991)
   PUBMED   1898763
REFERENCE   2
  AUTHORS   Emori,Y., Tsukahara,T., Kawasaki,H., Ishiura,S., Sugita,H. and
            Suzuki,K.
  TITLE     Molecular cloning and functional analysis of three subunits of
            yeast proteasome
  JOURNAL   Unpublished
REFERENCE   3  (bases 1 to 1560)
  AUTHORS   Emori,Y.
  TITLE     Direct Submission
  JOURNAL   Submitted (12-NOV-1990) Y. Emori, DEPT OF BIOPHYSICS &
            BIOCHEMISTRY, FACULTY OF SCIENCE, UNIVERSITY OF TOKYO, 7-3-1 HONGO,
            BUNKYO-KU, TOKYO 113, JAPAN
FEATURES             Location/Qualifiers
     source          1..1560
                     /organism="Saccharomyces cerevisiae"
                     /mol_type="genomic DNA"
                     /strain="X2180-1A"
                     /db_xref="taxon:4932"
     TATA_signal     306..309
     TATA_signal     353..356
     CDS             373..1149
                     /codon_start=1
                     /product="proteasome Y13 subunit"
                     /protein_id="CAA40054.1"
                     /db_xref="GI:506480"
                     /db_xref="GOA:P23638"
                     /db_xref="InterPro:IPR000426"
                     /db_xref="InterPro:IPR001353"
                     /db_xref="InterPro:IPR016050"
                     /db_xref="PDB:1FNT"
                     /db_xref="PDB:1G0U"
                     /db_xref="PDB:1G65"
                     /db_xref="PDB:1JD2"
                     /db_xref="PDB:1RYP"
                     /db_xref="PDB:1VSY"
                     /db_xref="PDB:1Z7Q"
                     /db_xref="PDB:2F16"
                     /db_xref="PDB:2FAK"
                     /db_xref="PDB:2GPL"
                     /db_xref="PDB:2ZCY"
                     /db_xref="PDB:3BDM"
                     /db_xref="PDB:3D29"
                     /db_xref="PDB:3DY3"
                     /db_xref="PDB:3DY4"
                     /db_xref="PDB:3E47"
                     /db_xref="PDB:3GPJ"
                     /db_xref="PDB:3GPT"
                     /db_xref="PDB:3GPW"
                     /db_xref="PDB:3HYE"
                     /db_xref="PDB:3L5Q"
                     /db_xref="SGD:S000003367"
                     /db_xref="UniProtKB/Swiss-Prot:P23638"
                     /translation="MGSRRYDSRTTIFSPEGRLYQVEYALESISHAGTAIGIMASDGI
                     VLAAERKVTSTLLEQDTSTEKLYKLNDKIAVAVAGLTADAEILINTARIHAQNYLKTY
                     NEDIPVEILVRRLSDIKQGYTQHGGLRPFGVSFIYAGYDDRYGYQLYTSNPSGNYTGW
                     KAISVGANTSAAQTLLQMDYKDDMKVDDAIELALKTLSKTTDSSALTYDRLEFATIRK
                     GANDGEVYQKIFKPQEIKDILVKTGITKKDEDEEADEDMK"
     polyA_signal    1395..1400
ORIGIN      
        1 ggaagaaggg tggtgttcta gcgatggtaa gattttgcca ttgcccaaag cccgataagc
       61 ctatcccact tcatgaatat ataacactcg cagagctcga tgttggagac agtgagtgag
      121 cagtgaattg ctcatgtttt ctctgcatcc tcatttaatg acaattagcc atgtaataac
      181 atcttgaggc agttaaatat tcgttaccct gcaggtggca aaaaatttat agaataaaag
      241 cataaaaaga tggatatcta tgtaataagg aaacattggc agagcgaaga gaacagactg
      301 ctttctataa aaagttttcg atcagtctct attttaataa ttgattattg gatatagtta
      361 gtagtgttaa acatgggttc cagaagatac gattccagga caacaatttt ctcccctgag
      421 ggacgtctat atcaggttga atacgcgcta gaatccattt cacatgcagg taccgcaatt
      481 gggattatgg catctgatgg gattgttctt gcagcagaac gcaaagtcac aagtacttta
      541 ctagaacaag acacctctac cgaaaaactt tataagttaa acgataaaat tgcggttgcc
      601 gttgctggac tgactgcaga tgcagaaatt ctaataaata cggctagaat tcacgctcaa
      661 aattacctta aaacctataa tgaagatata ccagtagaaa ttttggtgag aaggctaagt
      721 gatataaaac aaggttacac gcaacatggt ggtttaagac catttggtgt gtcctttatc
      781 tacgccggtt atgacgatag atacggttac caattgtata catctaatcc atcgggaaac
      841 tatacagggt ggaaggctat tagtgttggc gctaacacat cagcagcaca aaccctactt
      901 caaatggact acaaggatga tatgaaagtc gatgatgcca ttgaactggc tttaaaaacg
      961 ttatccaaaa ctaccgacag tagcgcgctg acttatgaca ggttggaatt tgctactatc
     1021 agaaagggtg ctaatgacgg agaagtgtat cagaagattt tcaagcctca agagataaag
     1081 gatatattgg taaagactgg tattaccaag aaggatgaag acgaagaagc tgatgaagat
     1141 atgaaataag gttggaaagt attgtttgac ttcatgctta tataaatatg tacgcataga
     1201 aacatatctc taaaattaaa agtgaaagaa aaaaatcgct aagattcccc tttcggggga
     1261 aggctgaaga aacttttttt tgcgcaaatt tcaagatcgg aatcgctcga gaggcaaata
     1321 taaaaaaggg cctgctcgtt tggcctactt gttgttggat tctaactcca atctaatttt
     1381 ggcagcactt caaaaataaa caaaagcgat gtctgcgtca ctagtgaatc gatcattgaa
     1441 aaatataagg aatgaattag aatttttgaa ggaatcaaac gtcatatcag gcgacatttt
     1501 cgaattaatc aatagcaagt tacctgagaa atgggatgga aaccaaagat cgccccaaaa
//

=============================================================================================================

this data should store and view in hdf5 format

• 2.1k views
ADD COMMENTlink modified 5.3 years ago by Michael Dondrup45k • written 5.3 years ago by chaitu10
1

The example INSDC database entry is from GenBank and is shown in the GenBank format, not the EMBL format used in EMBL-Bank. Compare the representations of the entry from the INSDC member databases:

ADD REPLYlink written 5.3 years ago by Hamish3.1k

thanks for the clarification, always mix these up

ADD REPLYlink written 5.3 years ago by Michael Dondrup45k

what's the advantage of storing this in a HDF5 file ? and do you just want to store this as a big string or do you want to store a structured document ?

ADD REPLYlink written 5.3 years ago by Pierre Lindenbaum117k

I want to store in a structured document like xml

ADD REPLYlink written 5.3 years ago by chaitu10

why not using XML ? or a XML-based database like eXist ?

ADD REPLYlink written 5.3 years ago by Pierre Lindenbaum117k
2
gravatar for Michael Dondrup
5.3 years ago by
Bergen, Norway
Michael Dondrup45k wrote:

After formatting, you can see that this Genbank formatted data. You generally can store strings in HDF and you possibly could make a hierarchical data type to store all fields, however HDF5 is best used when the data can be formatted as a multidimensional large array where random access is required to known coordinates in constant time. For this structured data type however you would not have any advantage of storing it in HDF. You will most likely need searches for identifiers or even full text searches in fields. For this purpose, database indices are a much better solution, and HDF is a poor solution because you would have to search through all fields for a search, because HDF afaik doesn't support indexing like eg. MySQL or Postgres do.

In conclusion, there will be no advantage of storing the data in HDF and the appropriate storage is either a RDBMS or an XML-database.

ADD COMMENTlink modified 5.3 years ago • written 5.3 years ago by Michael Dondrup45k

thanks for explaining but the thing is I just want to practice the code how to convert that file in to HDF5 for my understanding I have plans for later

ADD REPLYlink written 5.3 years ago by chaitu10
1

see my blog http://plindenbaum.blogspot.fr/2011/07/storing-snps-in-hdf5-file-my-notebook.html for a " simple" example. Good luck.

ADD REPLYlink written 5.3 years ago by Pierre Lindenbaum117k

thanks a lot..........

ADD REPLYlink written 5.3 years ago by chaitu10
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2502 users visited in the last hour