Question: Import Pubchem Compound Sdfs Into Oracle?
1
gravatar for jocsch
6.2 years ago by
jocsch10
jocsch10 wrote:

Hello,

i want to import the complete pubchem compound sdf files into oracle. Is there a way to do this? I didn't found any solution that works.

At the moment i try to migrate my mysql pubchem database (which i has created a year ago) to oracle.

But i want to do the direct import into oracle.

Wit best,

bladepit

• 1.2k views
ADD COMMENTlink modified 6.2 years ago by Pierre Lindenbaum116k • written 6.2 years ago by jocsch10
2
gravatar for Pierre Lindenbaum
6.2 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum116k wrote:

I would use a stream-based xslt transformation to import all the records. Here is a simple stylesheet storing one XML record in a key/value table:


<xsl:stylesheet xmlns:xsl="&lt;a href="http://www.w3.org/1999/XSL/Transform" "="" rel="nofollow">http://www.w3.org/1999/XSL/Transform' version='1.0' 
    xmlns:c="http://www.ncbi.nlm.nih.gov"
    > 
<xsl:output method="xml" omit-xml-declaration="yes"/>
<xsl:template match="/">
<xsl:apply-templates select="//c:PC-Compound"/>
</xsl:template>



<xsl:template match="c:PC-Compound">
insert into table(id,content)
values(
<xsl:value-of select="c:PC-Compound_id/c:PC-CompoundType/c:PC-CompoundType_id/c:PC-CompoundType_id_cid"/>
,
'<xsl:copy-of select="."/>'
);
</xsl:template>

</xsl:stylesheet>

(note: the quotes in the XML should be escaped)

Here is an example using my old tool xsltstream: ( http://plindenbaum.blogspot.fr/2010/02/processing-large-xml-documents-with.html )

$ curl -s "ftp://ftp.ncbi.nlm.nih.gov/pubchem/Compound/CURRENT-Full/XML/Compound_000000001_000025000.xml.gz" | gunzip -c |  java -jar xsltstream.jar -x stylesheet.xsl -q "PC-Compound"

insert into table(id,content)
values(
1
,
'<PC-Compound xmlns="&lt;a href=" http:="" www.ncbi.nlm.nih.gov"="" rel="nofollow">http://www.ncbi.nlm.nih.gov" xmlns:xs="http://www.w3.org/2001/XMLSchema-instance">
    <PC-Compound_id>
      <PC-CompoundType>
        <PC-CompoundType_id>
          <PC-CompoundType_id_cid>1</PC-CompoundType_id_cid>
        </PC-CompoundType_id>
      </PC-CompoundType>
     (...)
        <PC-Count_isotope-atom>0</PC-Count_isotope-atom>
        <PC-Count_covalent-unit>1</PC-Count_covalent-unit>
        <PC-Count_tautomers>1</PC-Count_tautomers>
      </PC-Count>
    </PC-Compound_count>
  </PC-Compound>'
);

insert into table(id,content)
values(
2
,
'<PC-Compound xmlns="&lt;a href=" http:="" www.ncbi.nlm.nih.gov"="" rel="nofollow">http://www.ncbi.nlm.nih.gov">
    <PC-Compound_id>
ADD COMMENTlink written 6.2 years ago by Pierre Lindenbaum116k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1013 users visited in the last hour