Genome Specific Database (Gbrowse / Ensembl Type)
2
4
Entering edit mode
14.2 years ago
Darked89 4.6k

I am interested in your opinions about database systems used to store, query and visualize genomic sequence and annotations. I am talking about ca 600-700Mb draft genome with a large number of contigs outside scaffolds. Yep, I know that annotating anything before reaching some quality milestones may be considered pointless, but I want to get the back end (DB) and the pipeline
working way before that.

So far I started testing Gbrowse (1.70), been impressed by Ensembl as an end-user, and looked at (unsuitable) eye candy GenomeProjector http://www.g-language.org/GenomeProjector/

I will appreciate any thoughts about ease of installation/maintenance and integration with annotation tools such as Apollo / Artemis.

Thanks

darked89

PS There is no way top add proper tags (genome annotation database) to this post

genome-annotation-database sequence • 4.4k views
ADD COMMENT
0
Entering edit mode

now fixed, tagging rules have been relaxed please try again, thanks

ADD REPLY
3
Entering edit mode
14.2 years ago

This is a really debated topic, whether it is better to store sequences on a database or on simple flat files. I have never had to annotate draft genomes as you so I can't suggest you which is the best approach for you, but I would recommend using flat files, as you will have more support and tools, it will take less time to set it up, and I have the feeling that that is the direction that most projects are taking for the future.

In case you want to use databases, have a look at this post and a this type of column type, the datatype-geometric.

In case you want to try flat files, you will have to study BED, GFF, and maybe BAM formats, along with VCF if you have snps. For example, if you BED, you will be able to use BEDTools, which will allow you to merge and work with genomic features and are very fast. You will be surprised to know that GBrowse uses only GFF files to store data, it has no DB backend.

Another alternative is HDF5, about which you may find some questions here. So, you have a lot of homework here :-)

ADD COMMENT
1
Entering edit mode

Actually, GBrowse can use either flat, GFF-format files or a database backend. Use of in-memory GFF files is not recommended for anything other than very small datasets. GBrowse can use MySQL or BerkeleyDB and you can also employ the Chado and BioSQL schemas. GBrowse can also act as a DAS client. See the administration documentation.

ADD REPLY
3
Entering edit mode
ADD COMMENT

Login before adding your answer.

Traffic: 2701 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6