Import Pubchem Compound Sdfs Into Oracle?
1
1
Entering edit mode
11.4 years ago
jocsch ▴ 10

Hello,

i want to import the complete pubchem compound sdf files into oracle. Is there a way to do this? I didn't found any solution that works.

At the moment i try to migrate my mysql pubchem database (which i has created a year ago) to oracle.

But i want to do the direct import into oracle.

Wit best,

bladepit

• 2.2k views
ADD COMMENT
2
Entering edit mode
11.4 years ago

I would use a stream-based xslt transformation to import all the records. Here is a simple stylesheet storing one XML record in a key/value table:


<xsl:stylesheet xmlns:xsl="&lt;a href="http://www.w3.org/1999/XSL/Transform" "="" rel="nofollow">http://www.w3.org/1999/XSL/Transform' version='1.0' 
    xmlns:c="http://www.ncbi.nlm.nih.gov"
    > 
<xsl:output method="xml" omit-xml-declaration="yes"/>
<xsl:template match="/">
<xsl:apply-templates select="//c:PC-Compound"/>
</xsl:template>



<xsl:template match="c:PC-Compound">
insert into table(id,content)
values(
<xsl:value-of select="c:PC-Compound_id/c:PC-CompoundType/c:PC-CompoundType_id/c:PC-CompoundType_id_cid"/>
,
'<xsl:copy-of select="."/>'
);
</xsl:template>

</xsl:stylesheet>

(note: the quotes in the XML should be escaped)

Here is an example using my old tool xsltstream: ( http://plindenbaum.blogspot.fr/2010/02/processing-large-xml-documents-with.html )

$ curl -s "ftp://ftp.ncbi.nlm.nih.gov/pubchem/Compound/CURRENT-Full/XML/Compound_000000001_000025000.xml.gz" | gunzip -c |  java -jar xsltstream.jar -x stylesheet.xsl -q "PC-Compound"

insert into table(id,content)
values(
1
,
'<PC-Compound xmlns="&lt;a href=" http:="" www.ncbi.nlm.nih.gov"="" rel="nofollow">http://www.ncbi.nlm.nih.gov" xmlns:xs="http://www.w3.org/2001/XMLSchema-instance">
    <PC-Compound_id>
      <PC-CompoundType>
        <PC-CompoundType_id>
          <PC-CompoundType_id_cid>1</PC-CompoundType_id_cid>
        </PC-CompoundType_id>
      </PC-CompoundType>
     (...)
        <PC-Count_isotope-atom>0</PC-Count_isotope-atom>
        <PC-Count_covalent-unit>1</PC-Count_covalent-unit>
        <PC-Count_tautomers>1</PC-Count_tautomers>
      </PC-Count>
    </PC-Compound_count>
  </PC-Compound>'
);

insert into table(id,content)
values(
2
,
'<PC-Compound xmlns="&lt;a href=" http:="" www.ncbi.nlm.nih.gov"="" rel="nofollow">http://www.ncbi.nlm.nih.gov">
    <PC-Compound_id>
ADD COMMENT

Login before adding your answer.

Traffic: 2932 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6