Biomart Sql Fixup
3
2
Entering edit mode
13.0 years ago
Jfsebastian ▴ 30

Hi,

I'm a sysadmin working for some bioinformaticians at Manchester University.

Today's chore would be trying to fix up some XML and compressed XML from a mildly customised Biomart-style mysql database we're moving off hardware soon to be 'retired'.

Essentially the problem is that the database contains some now invalid embedded absolute URLs that need altering -- naturally I could export and sed, but what to do about the compressed XML?

Does anyone know if there is a relatively painless way of doing this -- perhaps using the java utilities: 'marteditor' and friends?

Many thanks in advance,

[JFS]

biomart • 2.6k views
ADD COMMENT
5
Entering edit mode
13.0 years ago
Joachim ★ 2.9k

Please do not edit the XML manually, but have a look at the MartEditor tool that is part of BioMart 0.7 (http://www.biomart.org/user-docs.pdf). Having said that, I am not quite sure what you mean by compressed XML in this context. You can either get the configuration of the BioMart from the database itself, or you load a plain-vanilla XML-file that you previously saved with MartEditor. Perhaps you are referring to the XML-files needed in MartBuilder, but those are only required during the creation of a mart.

Have a look under "Meta-Data Transfer" in http://bergmanlab.smith.man.ac.uk/?p=35. The blog-post is perhaps interesting for you anyway, because it describes the complete workflow of setting up a mart, which includes the use of MartBuilder and MartEditor.

ADD COMMENT
1
Entering edit mode

Okay, I understand what you mean by compressed XML now. Please do not alter the meta_* tables yourself, but use MartEditor to load the data from these tables instead. You can just connect with MartEditor to that mart database and it will automatically retrieve the data from those tables. MartEditor also writes your modified config back into those tables.

ADD REPLY
0
Entering edit mode

Hi Joachim: thanks for responding.

The compressed xml lives in a table 'meta_confxmldm' as field three, 'compressedxml' of type 'longblob'. Field two is 'xml', i.e. uncompressed xml -- also a 'longblob'.

But OK -- I'll follow the suggestions here and refrain from hand hacking the raw SQL.

ADD REPLY
0
Entering edit mode

Hi Joachim: thanks for responding. The compressed xml lives in a table 'meta_confxmldm' as field three, 'compressedxml' of type 'longblob'. Field two is 'xml', i.e. uncompressed xml -- also a 'longblob'. The fourth field is actually a hash of either field two or three (or both?). But OK -- I'll follow the suggestions here and refrain from hand hacking the raw SQL. – jfsebastian 0 secs ago

ADD REPLY
1
Entering edit mode
13.0 years ago
Ben Lange ▴ 210

I'd agree that it's not real clear what you mean by compressed Xml. The BioMart User Doc referenced by Jochim, references a compression module for Apache which shouldn't impact the data in the actual xml column.

Consider doing this with just SQL.
Here's some pseudo code. [?][?] Create table FixedXml Clustered Index - Link back to the table that originally hosted the xml. FixedXml

while records affected > 0 begin insert into FixedXml ([Clustered Index], [Fixed Xml]) select top 1000 [Clustered Index], Replace(SourceXml, 'BadUrl', 'GoodUrl') from SourceTable A left join FixedXml B on A.[ClusteredIndex] = B.[ClusteredIndex] where B.[ClusteredIndex] is null end [?][?]

Once you've got a fixed version of the xml, update the original table and drop this fixedxml table.

ADD COMMENT
1
Entering edit mode
13.0 years ago
Jfsebastian ▴ 30

So "MartEditor" kept bombing out for me: looks like an effectively now unobtainable version was used to create the Mart back in 2007 or so.

However, it was possible to reverse engineer matters:

mysql> desc meta_conf__xml__dm;
+----------------+----------+------+-----+---------+-------+
| Field          | Type     | Null | Key | Default | Extra |
+----------------+----------+------+-----+---------+-------+ 
| dataset_id_key | int(11)  | NO   | PRI | NULL    |       | 
| xml            | longblob | YES  |     | NULL    |       | 
| compressed_xml | longblob | YES  |     | NULL    |       | 
| message_digest | blob     | YES  |     | NULL    |       | 
+----------------+----------+------+-----+---------+-------+

here compressed_xml is just gzip of xml and message_digest is the (binary) md5: this can be verified by dumping the blobs to files with select ... into dumpfile ... and examining those files.

So after suitably sed-ing my dumped out xml, I could recreate the compressed_xml and message_digest with gzip -c and openssl dgst -md5 -binary respectively and then slurp back in with an update ... LOAD_FILE('') -- for some reason I had to temporarily set SElinux to permissive before that last step would work.

A similar surgery is possible upon meta_template__xml__dm.

After wiping the mod_perl cached registry and giving httpd a kick, amazingly this fixed my problem. I'm not suggesting anyone out there attempt this -- but sometimes you need to nuke from orbit.

ADD COMMENT

Login before adding your answer.

Traffic: 1846 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6