Question: Genome Specific Database (Gbrowse / Ensembl Type)
4
gravatar for Darked89
10.8 years ago by
Darked894.2k
Barcelona, Spain
Darked894.2k wrote:

I am interested in your opinions about database systems used to store, query and visualize genomic sequence and annotations. I am talking about ca 600-700Mb draft genome with a large number of contigs outside scaffolds. Yep, I know that annotating anything before reaching some quality milestones may be considered pointless, but I want to get the back end (DB) and the pipeline
working way before that.

So far I started testing Gbrowse (1.70), been impressed by Ensembl as an end-user, and looked at (unsuitable) eye candy GenomeProjector http://www.g-language.org/GenomeProjector/.

I will appreciate any thoughts about ease of installation/maintenance and integration with annotation tools such as Apollo / Artemis.

Thanks

darked89

PS There is no way top add proper tags (genome annotation database) to this post

ADD COMMENTlink modified 10.8 years ago by Yannick Wurm2.3k • written 10.8 years ago by Darked894.2k

now fixed, tagging rules have been relaxed please try again, thanks

ADD REPLYlink written 10.8 years ago by Istvan Albert ♦♦ 85k
3
gravatar for Giovanni M Dall'Olio
10.8 years ago by
London, UK
Giovanni M Dall'Olio27k wrote:

This is a really debated topic, whether it is better to store sequences on a database or on simple flat files. I have never had to annotate draft genomes as you so I can't suggest you which is the best approach for you, but I would recommend using flat files, as you will have more support and tools, it will take less time to set it up, and I have the feeling that that is the direction that most projects are taking for the future.

In case you want to use databases, have a look at this post and a this type of column type, the datatype-geometric.

In case you want to try flat files, you will have to study BED, GFF, and maybe BAM formats, along with VCF if you have snps. For example, if you BED, you will be able to use BEDTools, which will allow you to merge and work with genomic features and are very fast. You will be surprised to know that GBrowse uses only GFF files to store data, it has no DB backend.

Another alternative is HDF5, about which you may find some questions here. So, you have a lot of homework here :-)

ADD COMMENTlink written 10.8 years ago by Giovanni M Dall'Olio27k
1

Actually, GBrowse can use either flat, GFF-format files or a database backend. Use of in-memory GFF files is not recommended for anything other than very small datasets. GBrowse can use MySQL or BerkeleyDB and you can also employ the Chado and BioSQL schemas. GBrowse can also act as a DAS client. See the administration documentation.

ADD REPLYlink modified 15 months ago by _r_am31k • written 10.7 years ago by Neilfws49k
3
gravatar for Yannick Wurm
10.7 years ago by
Yannick Wurm2.3k
Queen Mary University London
Yannick Wurm2.3k wrote:

What are your needs?

http://stackoverflow.com/questions/1890285/are-there-any-existing-solutions-for-creating-a-generic-dna-sequence-database-wit/1893358

I think chado/Apollo is the way to go.

ADD COMMENTlink written 10.7 years ago by Yannick Wurm2.3k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1864 users visited in the last hour