Question

construction of a database

1

Entering edit mode

3.0 years ago

Debut ▴ 20

Hello I am looking to build a database to store bacterial genomes, I don't know if I should do it with a relational database management system (SQL) or a non-relational database management system (like NoSqL). Could someone please guide me?

sql noSQL neo4j database • 3.0k views

ADD COMMENT • link updated 13 months ago by Ram 43k • written 3.0 years ago by Debut ▴ 20

3

Entering edit mode

Some people, when presented problems in a relational database system think "I know, I'll use NoSQL". Now they have two problems

ADD REPLY • link 3.0 years ago by Pierre Lindenbaum 161k

1

Entering edit mode

Could someone please guide me?

it depends what kind of data you want to be stored

ADD REPLY • link 3.0 years ago by Pierre Lindenbaum 161k

0

Entering edit mode

Thank you for your answer. genomic sequences and protein sequences organized by bacterial species

ADD REPLY • link 3.0 years ago by Debut ▴ 20

0

Entering edit mode

What kind of information you are planning to store?

ADD REPLY • link 3.0 years ago by Arup Ghosh 3.2k

0

Entering edit mode

Thank you for your answer. genomic sequences and protein sequences organized by bacterial species

ADD REPLY • link 3.0 years ago by Debut ▴ 20

0

Entering edit mode

genomic sequences and protein sequences organized by bacterial species

without any other requirement, you only need one table.

create table myDatabase (
taxon varchar(100),
seqtype enum(dna,prot),
seq longtext 
)

ADD REPLY • link 3.0 years ago by Pierre Lindenbaum 161k

0

Entering edit mode

what kind of language is it please

ADD REPLY • link 3.0 years ago by Debut ▴ 20

3

Entering edit mode

I'm sorry - how are you exploring SQL/NoSQL options when a basic SQL statement is foreign to you? How do you intend on understanding architectural pros and cons of using one approach versus the other, and how would you implement the one you pick?

ADD REPLY • link 3.0 years ago by Ram 43k

0

Entering edit mode

It's a SQL statement that will create a table named myDatabase with three columns taxon, seqtype and seq.

ADD REPLY • link 3.0 years ago by Arup Ghosh 3.2k

0

Entering edit mode

organized by bacterial species

What exactly do you want to do? Be able to query the sequences/find specific ID's?

These sequence files generally are available as a single gene or protein multi-fasta file per genome and they could simply be put in species specific folders to segregate them if you just want to keep them organized.

ADD REPLY • link 3.0 years ago by GenoMax 142k

0

Entering edit mode

I would like to make a database that stores genomic sequences and protein sequences of several species of bacteria

ADD REPLY • link 2.9 years ago by Debut ▴ 20

0

Entering edit mode

I want to make a graphical interface, when a person presses on the name of the species in the dropdown list and then on a button, this person will be able to retrieve directly the fasta files with the genomic sequences. or he will be able to choose also to retrieve the protein sequences.

ADD REPLY • link 2.9 years ago by Debut ▴ 20

0

Entering edit mode

retrieve directly the fasta files with the genomic sequences

This obviously does not require a database if you are going to expect the person to get the full sequence file.

I want to make a graphical interface, when a person presses on the name of the species in the dropdown list and then on a button

What you are describing is a web application (that can have a database behind it). You could simply have deep links to the relevant files to NCBI sites (logical) or you could download the files locally (not trivial would need space).

NCBI already makes such a genome browser available here.

ADD REPLY • link 2.9 years ago by GenoMax 142k

0

Entering edit mode

the goal is to have the sequences already uploaded on the server and to retrieve them directly.

ADD REPLY • link 2.9 years ago by Debut ▴ 20

0

Entering edit mode

So, download from NCBI, upload to your local server and re-download them on demand? Why?

ADD REPLY • link 2.9 years ago by Ram 43k

0

Entering edit mode

no just download them locally and put them in a database and then make the GUI

ADD REPLY • link 2.9 years ago by Debut ▴ 20

0

Entering edit mode

Will the GUI allow the user to "retrieve" the sequence? If so, where from? In a broader context, why are you building this GUI application. Do bacterial genome browsers not exist?

ADD REPLY • link 2.9 years ago by Ram 43k

0

Entering edit mode

when we choose the species, behind it we get the sequences downloaded for this species

ADD REPLY • link 2.9 years ago by Debut ▴ 20

0

Entering edit mode

for example for klebsiella pneumoniae he asks me to have asks to have the 10 703 sequences in "https://www.ncbi.nlm.nih.gov/genome/browse/#!/prokaryotes/815/" at the level of "Genome Assembly and Annotation report (10703)".

ADD REPLY • link 2.9 years ago by Debut ▴ 20

0

Entering edit mode

Is this an exercise or assignment? If not I don't get the point of replicating GenBank.

ADD REPLY • link 2.9 years ago by Michael 54k

0

Entering edit mode

It's for an internship. They want to make a graphical interface for biologists that is easier than on NCBI

ADD REPLY • link 2.9 years ago by Debut ▴ 20

0

Entering edit mode

Do you have to build from scratch? There are existing solutions like Tripal that could be used.

For reasons of learning and didactics, I recommend sticking with (simple) SQL, it has many applications, is standardized, has a solid mathematical foundation, and you learn some transferable concepts for life. With noSQL, which is not really the "opposite of SQL" you can only use that exact same noSQL system you trained on.

I'd try SQLite for starters, that's good enough to study some basic concepts of SQL without complex client server architecture. Then develop a simplistic data model, and after that a simplistic web interface as a prototype. Discuss your solution with your stakeholders often and be agile in incorporating changes.

Expect a somewhat steep learning curve, but nothing (and in particular not noSQL solutions) can spare you this.

ADD REPLY • link 2.9 years ago by Michael 54k

0

Entering edit mode

I made this database in SQL. What do you think?

create table Genome ( ID interger NOT NULL AUTO_INCREMENT, name varchar (300) NOT NULL, sequence text, numero_accesion text);

create table gene ( ID integer NOT NULL AUTO_INCREMENT, name text, sequence text FOREIGN KEY (ID_genome) REFERENCES genome (ID));

create table proteine ( ID integer NOT NULL AUTO_INCREMENT, name text, sequence text FOREIGN KEY (ID_gene) REFERENCES gene (ID));

create table Espece( ID integer NOT NULL AUTO_INCREMENT PRIMARY KEY, name_latin text, FOREIGN KEY (ID_Genome) REFERENCES genome (ID));

how could I implement it with several fasta files please? thanks

ADD REPLY • link 2.9 years ago by Debut ▴ 20

0

Entering edit mode

Google "bulk upload to SQL". We cannot walk you through your internship.

ADD REPLY • link 2.9 years ago by Ram 43k

0

Entering edit mode

Might work for starters. I would:

only use one language for you definitions, not mixing french and english
I would reference species from genome not the other way around
Do you really want to store the whole genome sequence in the database? What if the genome consists of multiple replicons (also bacteria can have that feature)?

You can import csv formatted data into SQLite tables directly if the csv file has the same format as the table, you need to re-format fasta files using a script or you have to write your own importer using a SQLite database backend library.

After that it’s time to test your database and find out if and how you can make the queries you need.

ADD REPLY • link 2.9 years ago by Michael 54k