I was wondering if there were any documents showing how the ensembl marts were created from the main ensembl databases. Specifically i was hoping there were documents describing what tables were selected as main tables for the marts and how the dimension tables were mapped to the main tables.
As an example the ensemblmart61 contains a main table for human named translationmain (this is an abbreviation of the name but its obvious which one i mean) and this has a field called proteinfeatureprintsbool which is essentially a boolean field indicating whether a protein translation is assocated with a row in the PRINTS dimension table proteinfeatureprints_dm. If the translation does have a row in this dimension table then I am guessing it has a PRINTS domain in it!
The core database itself however has a table called translation which represents, well, a translation. Translations are linked to rows in a tabled called 'proteinfeature' which in turn has a foreign key called analysisid which links to an 'analysis' table with fields 'database' and 'program'. So in this schema, a translation is associated with a PRINTS annotation if it is linked to a 'protein_feature' record which is in turn linked to an 'analysis' record with the text 'PRINTS' somewhere in both/either the database/program fields.
I am interested in how the biomart software is configured with 'rules' to create the mart schema from the database schema. Is there a configuration file with these rules in that I could look at? Is there a worked example? As an academic exercise I'd like to recreate the ensembl marts. I have the biomart user manual but even with that document I do not know how to recreate the ensembl marts
I am NOT specifically interested in protein domains. I used the PRINTS example purely for illustrative purposes as I thought it was a strightforward example. I am interested in how you specify the 'rules' to get from a schema to a mart.
thanks a lot