construction of a database
0
1
Entering edit mode
3.0 years ago
Debut ▴ 20

Hello I am looking to build a database to store bacterial genomes, I don't know if I should do it with a relational database management system (SQL) or a non-relational database management system (like NoSqL). Could someone please guide me?

sql noSQL neo4j database • 3.0k views
ADD COMMENT
3
Entering edit mode

Some people, when presented problems in a relational database system think "I know, I'll use NoSQL". Now they have two problems

ADD REPLY
1
Entering edit mode

Could someone please guide me?

it depends what kind of data you want to be stored

ADD REPLY
0
Entering edit mode

Thank you for your answer. genomic sequences and protein sequences organized by bacterial species

ADD REPLY
0
Entering edit mode

What kind of information you are planning to store?

ADD REPLY
0
Entering edit mode

Thank you for your answer. genomic sequences and protein sequences organized by bacterial species

ADD REPLY
0
Entering edit mode

genomic sequences and protein sequences organized by bacterial species

without any other requirement, you only need one table.

create table myDatabase (
taxon varchar(100),
seqtype enum(dna,prot),
seq longtext 
)
ADD REPLY
0
Entering edit mode

what kind of language is it please

ADD REPLY
3
Entering edit mode

I'm sorry - how are you exploring SQL/NoSQL options when a basic SQL statement is foreign to you? How do you intend on understanding architectural pros and cons of using one approach versus the other, and how would you implement the one you pick?

ADD REPLY
0
Entering edit mode

It's a SQL statement that will create a table named myDatabase with three columns taxon, seqtype and seq.

ADD REPLY
0
Entering edit mode

organized by bacterial species

What exactly do you want to do? Be able to query the sequences/find specific ID's?

These sequence files generally are available as a single gene or protein multi-fasta file per genome and they could simply be put in species specific folders to segregate them if you just want to keep them organized.

ADD REPLY
0
Entering edit mode

I would like to make a database that stores genomic sequences and protein sequences of several species of bacteria

ADD REPLY
0
Entering edit mode

I want to make a graphical interface, when a person presses on the name of the species in the dropdown list and then on a button, this person will be able to retrieve directly the fasta files with the genomic sequences. or he will be able to choose also to retrieve the protein sequences.

ADD REPLY
0
Entering edit mode

retrieve directly the fasta files with the genomic sequences

This obviously does not require a database if you are going to expect the person to get the full sequence file.

I want to make a graphical interface, when a person presses on the name of the species in the dropdown list and then on a button

What you are describing is a web application (that can have a database behind it). You could simply have deep links to the relevant files to NCBI sites (logical) or you could download the files locally (not trivial would need space).

NCBI already makes such a genome browser available here.

ADD REPLY
0
Entering edit mode

the goal is to have the sequences already uploaded on the server and to retrieve them directly.

ADD REPLY
0
Entering edit mode

So, download from NCBI, upload to your local server and re-download them on demand? Why?

ADD REPLY
0
Entering edit mode

no just download them locally and put them in a database and then make the GUI

ADD REPLY
0
Entering edit mode

Will the GUI allow the user to "retrieve" the sequence? If so, where from? In a broader context, why are you building this GUI application. Do bacterial genome browsers not exist?

ADD REPLY
0
Entering edit mode

when we choose the species, behind it we get the sequences downloaded for this species

ADD REPLY
0
Entering edit mode

for example for klebsiella pneumoniae he asks me to have asks to have the 10 703 sequences in "https://www.ncbi.nlm.nih.gov/genome/browse/#!/prokaryotes/815/" at the level of "Genome Assembly and Annotation report (10703)".

ADD REPLY
0
Entering edit mode

Is this an exercise or assignment? If not I don't get the point of replicating GenBank.

ADD REPLY
0
Entering edit mode

It's for an internship. They want to make a graphical interface for biologists that is easier than on NCBI

ADD REPLY
0
Entering edit mode

Do you have to build from scratch? There are existing solutions like Tripal that could be used.

For reasons of learning and didactics, I recommend sticking with (simple) SQL, it has many applications, is standardized, has a solid mathematical foundation, and you learn some transferable concepts for life. With noSQL, which is not really the "opposite of SQL" you can only use that exact same noSQL system you trained on.

I'd try SQLite for starters, that's good enough to study some basic concepts of SQL without complex client server architecture. Then develop a simplistic data model, and after that a simplistic web interface as a prototype. Discuss your solution with your stakeholders often and be agile in incorporating changes.

Expect a somewhat steep learning curve, but nothing (and in particular not noSQL solutions) can spare you this.

ADD REPLY
0
Entering edit mode

I made this database in SQL. What do you think?

create table Genome ( ID interger NOT NULL AUTO_INCREMENT, name varchar (300) NOT NULL, sequence text, numero_accesion text);

create table gene ( ID integer NOT NULL AUTO_INCREMENT, name text, sequence text FOREIGN KEY (ID_genome) REFERENCES genome (ID));

create table proteine ( ID integer NOT NULL AUTO_INCREMENT, name text, sequence text FOREIGN KEY (ID_gene) REFERENCES gene (ID));

create table Espece( ID integer NOT NULL AUTO_INCREMENT PRIMARY KEY, name_latin text, FOREIGN KEY (ID_Genome) REFERENCES genome (ID));

how could I implement it with several fasta files please? thanks

ADD REPLY
0
Entering edit mode

Google "bulk upload to SQL". We cannot walk you through your internship.

ADD REPLY
0
Entering edit mode

Might work for starters. I would:

  • only use one language for you definitions, not mixing french and english
  • I would reference species from genome not the other way around
  • Do you really want to store the whole genome sequence in the database? What if the genome consists of multiple replicons (also bacteria can have that feature)?

You can import csv formatted data into SQLite tables directly if the csv file has the same format as the table, you need to re-format fasta files using a script or you have to write your own importer using a SQLite database backend library.

After that it’s time to test your database and find out if and how you can make the queries you need.

ADD REPLY

Login before adding your answer.

Traffic: 1436 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6