how to implement a database with several FASTA files
1
0
Entering edit mode
2.9 years ago
Debut ▴ 20

Hi, I need your help, I have several FASTA files and I need to make a database of bacterial genomes (I am working with an example "klebsiella pneumoniae"): I thought of making a database with several classes (genomes, species, genus), I was proposed to make a single class:

create table myDatabase (
taxon varchar(100),
seqtype enum(dna,prot),
seq longtext 
)

I don't know how to implement my database with several (thousands) genomes. because with this method for example it would take a lot of time ("INSERT INTO myDatabase VALUES ('klebsiella', 'dan, AATU...);).

Moderator Edit: Previous thread for context: construction of a database

Translated with www.DeepL.com/Translator (free version)

Database NCBI FASTA • 2.2k views
ADD COMMENT
0
Entering edit mode

Creating an SQL database with sequence data is not a great idea. In any case, you should to be able to use bulk import facilities to import records in bulk as long as you have them in a tab/comma separated format file. Google will help you if you search for "bulk import from file to SQL" and add your SQL provider (MySQL, SQLite, etc) as another keyword in the search

ADD REPLY
0
Entering edit mode

Thank you for your answer, you advise me to make a database with NoSQL?

ADD REPLY
0
Entering edit mode

I did not give such advice. I asked a question on how you're making this decision between SQL/NoSQL without understanding the architecture of either.

ADD REPLY
0
Entering edit mode

These are FASTA files so they are not tab separated/comma format files.

ADD REPLY
1
Entering edit mode

but they can be easily converted to tab delimited files...

ADD REPLY
0
Entering edit mode
2.9 years ago

Don't store the sequence in the database, only the metadata.

The fasta files, in turn, can be concatenated into a single multifasta file then indexed for fast access.

https://github.com/mdshw5/pyfaidx

ADD COMMENT

Login before adding your answer.

Traffic: 2566 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6