Question: Genome Specific Database (Gbrowse / Ensembl Type)
10.8 years ago by
I am interested in your opinions about database systems used to store, query and visualize genomic sequence and annotations. I am talking about ca 600-700Mb draft genome with a large number of contigs outside scaffolds. Yep, I know that annotating anything before reaching some quality milestones may be considered pointless, but I want to get the back end (DB) and the pipeline
working way before that.

So far I started testing Gbrowse (1.70), been impressed by Ensembl as an end-user, and looked at (unsuitable) eye candy GenomeProjector

I will appreciate any thoughts about ease of installation/maintenance and integration with annotation tools such as Apollo / Artemis.



10.8 years ago by
This is a really debated topic, whether it is better to store sequences on a database or on simple flat files. I have never had to annotate draft genomes as you so I can't suggest you which is the best approach for you, but I would recommend using flat files, as you will have more support and tools, it will take less time to set it up, and I have the feeling that that is the direction that most projects are taking for the future.

In case you want to use databases, have a look at this post and a this type of column type, the datatype-geometric.

In case you want to try flat files, you will have to study BED, GFF, and maybe BAM formats, along with VCF if you have snps. For example, if you BED, you will be able to use BEDTools, which will allow you to merge and work with genomic features and are very fast. You will be surprised to know that GBrowse uses only GFF files to store data, it has no DB backend.

Another alternative is HDF5, about which you may find some questions here. So, you have a lot of homework here :-)

Actually, GBrowse can use either flat, GFF-format files or a database backend. Use of in-memory GFF files is not recommended for anything other than very small datasets. GBrowse can use MySQL or BerkeleyDB and you can also employ the Chado and BioSQL schemas. GBrowse can also act as a DAS client. See the administration documentation.

10.7 years ago by
What are your needs?

I think chado/Apollo is the way to go.

