Forum: Which database software is better for bioinformatics web development? MySQL or MongoDB?
0
gravatar for zhangjk21
8 weeks ago by
zhangjk210
China/Wuhan/HUST
zhangjk210 wrote:

Once I did a bioinformatics project using MongoDB, but I'm kind of confused which database software is more suitable for bioinformatics web development. Sure, MySQL, relational database, is great. However, what's the difference between the performance of these two kinds of database, relational and non-relational? PS: Does it depends on whether we use PHP or not?

ADD COMMENTlink modified 8 weeks ago by Eric Lim1.1k • written 8 weeks ago by zhangjk210
10
gravatar for Michael Dondrup
8 weeks ago by
Bergen, Norway
Michael Dondrup45k wrote:

This is a very general question, in particular what is bioinformatics web development? It really depends on the application case and requirements, and certainly the type of data to store. Note: Database Management Systems (DBMS) are a back-end technology meaning they are used for storing and retrieving the data in a reliable and performant way. web development refers to front-end technology, that is the presentation of the application to the user and how the application is interacting with the user. Deciding on the technology should be the last step in the design process of a software.

Some questions to guide the decision process:

  • How is your data structured, can the data be represented in relational format?
  • Have you modeled the data using UML/ER diagrams?
  • Volume of your data?
  • Do you update/read often or rarely?
  • Do you update/read in parallel?
  • Do you need transactions, atomic actions, ACID, to ensure consistency?
  • Do you need foreign keys, indexing, stored procedures, views?
  • Distributed?
  • Other performance boosting required?
  • Replication required?
  • Do you need a well-defined Query language (SQL, XPATH) or can you hardcode a few queries?
  • can you setup a DB server in client server infrastructure?
  • do you need remote connections (CLI, TCP, web-services) to the database?
  • which authentication/authorization model do you use for your application?
  • which backup strategy do you prefer/policies are in place at the site of deployment?
  • Do language bindings to preferred frame-work/coding tools/languages exist at the appropriate level of abstraction?
  • Which support model do you prefer? (Some might prefer a commercial support model)

Candidates:

  • PostgreSQL (most feature-rich, covers practically everything named above, richest SQL, including Document database/NoSQL functions, will work most of the time)
  • MySQL, MariaDB, etc. (very similar in feature richness, maybe a bit higher performance)
  • SQLite (file based SQL database management, no server required, great for testing, should fit most bioinformatics needs, unless many concurrent updates, easy to share as a file - no dump required, easiest way to start and to learn SQL)
  • Oracle (if you need a commercial DBMS)
  • Large variable matrices (microarrays, RNA-seq data, other measurements) or array-shaped data with variable dimensions or loosely defined structure are not such a good fit for Relational Databases, HDF5 or similar formats could be a reasonable alternative, possibly keeping metadata in MongoDB.
ADD COMMENTlink modified 8 weeks ago • written 8 weeks ago by Michael Dondrup45k

Great idea! There are so many things I never learned. Thanks, it really help me a lot.

ADD REPLYlink written 8 weeks ago by zhangjk210

in light of the 'bioinformatics' side of the question, I think it also becomes important to consider where you plan to run the app from. A lot of bioinformatics work takes place on shared systems like HPC's, which have restricted user-rights and software access; in these cases, I end up going with SQLite a lot because its often the easiest to get running there.

ADD REPLYlink written 8 weeks ago by steve1.8k
3
gravatar for Raony Guimarães
8 weeks ago by
Dublin / Ireland
Raony Guimarães860 wrote:

You could use PostgreSQL and JsonB to get the best of both worlds: relational and non-relational features.

ADD COMMENTlink written 8 weeks ago by Raony Guimarães860

Thank you very much! I will give it a try.

ADD REPLYlink written 8 weeks ago by zhangjk210
2
gravatar for Eric Lim
8 weeks ago by
Eric Lim1.1k
Boston
Eric Lim1.1k wrote:

Micheal has given and asked some really excellent questions, so I'll just share what we've been doing.

Our approach is developing and deploying events driven web applications using serverless technology (https://aws.amazon.com/lambda/ ) and couple these applications with various serverless database services (https://aws.amazon.com/dynamodb/, https://aws.amazon.com/rds/aurora/, and https://cloud.google.com/bigquery/ ). The idea behind this approach is to reduce the group's burden in maintaining running servers so we can focus on the development and analytics.

ADD COMMENTlink modified 8 weeks ago by genomax58k • written 8 weeks ago by Eric Lim1.1k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1645 users visited in the last hour