Question: What Are The Advantages Of Data Management In Databases?
gravatar for jobinv
4.8 years ago by
Bergen, Norway
jobinv1.1k wrote:

I recently described our group's current status in this post: We have the minimum of everything required for bioinformatics analysis; why do we need more?

This is a follow-up to one of those points, namely the data management issue. Would someone be able to give me good arguments for why it is better to switch over to database-based data management? What are the advantages of this, that I would not be able to do by just keeping everything in files?

database • 2.4k views
ADD COMMENTlink modified 4.8 years ago by Istvan Albert ♦♦ 77k • written 4.8 years ago by jobinv1.1k

Well, you will have to define more clearly what you would want to store in such a databases. Generally speaking, databases are good for relational data.

Also, I don't think it makes sense to explicitly use either.

ADD REPLYlink written 4.8 years ago by David Westergaard1.4k

Perhaps not a strictly bioinformatics-related question, this, but it is so tightly connected to what we need to do in bioinformatics that I still consider it appropriate for this forum. Please let me know if I am wrong about this.

ADD REPLYlink written 4.8 years ago by jobinv1.1k

You posed and answered your question in the same sentence: "not a strictly bioinformatics-related question" yet "so tightly connected to what we need to do in bioinformatics". I consider activities connected to bioinformatics to be the subject of bioinformatics questions. So, I think this question is completely appropriate here.

As to the question itself, without a database, what method would you suggest for making queries across all your projects? A script that scans directories and reads standardized flat files? The "management" part implies the ability to gain and navigate some kind of overview. I employ both methods due to a generally un-directed and historically messy design process, but seldom hear about how to "manage" data overviews without a database.

ADD REPLYlink written 4.8 years ago by seidel6.5k

A friend of mine made a suggestion to me just earlier today, that if I'm talking about a rare query that I'm interested in doing across projects, then it might be better to just stick with flat files. He was suggesting that maintaining a database with all its hassles might be a bit excessive for what I would need it for.

ADD REPLYlink written 4.8 years ago by jobinv1.1k
1 "database vs. flat files" ; "Flat file vs database - speed?"; etc...

ADD REPLYlink written 4.8 years ago by Pierre Lindenbaum110k
gravatar for Ido Tamir
4.8 years ago by
Ido Tamir4.8k
Ido Tamir4.8k wrote:

I guess with "database" you think of a relational database.

  • Pros:

    1. less data redundancy (if normalized) this:
      • reduces errors
      • enables consistent data changes (e.g. renaming of one experimental condition across multiple experiments)
    2. a standard query and reporting language across all your data
    3. error checking on data entry (completeness of records, wrong data types)
    4. integrity on data changes (ACID for most relational databases
    5. tools allow relatively easy construction of GUIs from database models
    6. most programming languages have drivers for RDMS. So you have one data model and can query/report/update with R, java, python etc...
    7. all data in one place (compared to data in folders). This allows you to integrate data across experiments for checking of systematic trends e.g. quality control
    8. evolution will be consistent across all the data. Which will be a little more difficult than an ad hoc change with the current project, but the consistency pays off.
  • Cons:
  1. Some data structures are more difficult to represent in a relational database e.g. trees
  2. Need some thought (and experience) at the beginning to implement well

For most of my analysis projects I also have a sample csv table. But as the project grows I start to feel the pain (mostly data inconsistencies). We also have some RDBMs for the real stuff of course, but the additional data that I have for the individual projects (additional sample annotation from the researcher) is not entered into the RDBMs, because it has no place there. But I query to RDBMs to check for some consistency.

ADD COMMENTlink modified 4.8 years ago • written 4.8 years ago by Ido Tamir4.8k
gravatar for Istvan Albert
4.8 years ago by
Istvan Albert ♦♦ 77k
University Park, USA
Istvan Albert ♦♦ 77k wrote:

Be careful not to think in terms of (false) choices. Storing data in databases does not preclude you from also keeping them around in flat files.

Databases are designed to represent/query information stored in a predetermined format. They work best when used in a specialized context and for solving a well defined use case.

In fact you probably would need to create different databases for different use cases.

The more "unified" and "global" your database the more untenable and difficult your task of creating and maintaining them.

ADD COMMENTlink modified 4.8 years ago • written 4.8 years ago by Istvan Albert ♦♦ 77k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1501 users visited in the last hour