Question: Pubchem Database Into Mysql
5
gravatar for Jochen Schreiber
9.0 years ago by
Jochen Schreiber50 wrote:

Hello to everybody,

i hope this is the right forum to ask this question.

I want to download the pubchem substance database and put all informations into an mysql database. Is this possible and if how?

Second question is then: Is there an script which automatically update the database?

I didn't found anything about this question.

With best regards, Jochen Schreiber

database mysql • 6.0k views
ADD COMMENTlink modified 4.5 years ago by ostrokach290 • written 9.0 years ago by Jochen Schreiber50
1
gravatar for Pascal
9.0 years ago by
Pascal1.5k
Barcelona
Pascal1.5k wrote:

Have a look to moldb5 it shows at least how to download SDF files from pubchem and import it into a MySQL DB.

ADD COMMENTlink written 9.0 years ago by Pascal1.5k
1
gravatar for Pierre Lindenbaum
9.0 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum131k wrote:

There is a XML schema (XSD) for the XML files of pubchem; ftp://ftp.ncbi.nih.gov/pubchem/specifications/pug.xsd

you could generate the tables and import the data with a "XSD to SQL" converter. see http://stackoverflow.com/questions/138575/how-can-i-create-database-tables-from-xsd-files

ADD COMMENTlink modified 13 months ago by RamRS30k • written 9.0 years ago by Pierre Lindenbaum131k

I want to download SDF files for a list (.xl) of compounds automatically !! is there any python or R script?

ADD REPLYlink written 2.7 years ago by desai.contacts0
1
gravatar for Wolf Ihlenfeldt
9.0 years ago by
Wolf Ihlenfeldt150 wrote:

The question is of course why you'd want to do that. As mentioned in your own question, updates are a constant hassle.

There are a couple of interfaces available hiding the complexities of the PUG and EUtils gateways into PubChem, so you can work locally with the current PubChem data as if it were a regular file or local database. That is much more convenient (I am guessing that your queries are not top secret...)

ADD COMMENTlink written 9.0 years ago by Wolf Ihlenfeldt150

Could you provide some additional information and links to some of these interfaces you mention?

ADD REPLYlink written 8.9 years ago by Malachi Griffith18k

I havejust sent an email with a PowerPoint presentation

ADD REPLYlink written 8.9 years ago by Wolf Ihlenfeldt150

But the email address in your profile at genome.wustl.edu bouces. How can I reach you?

ADD REPLYlink written 8.9 years ago by Wolf Ihlenfeldt150
0
gravatar for Yogesh Pandit
8.9 years ago by
Yogesh Pandit500
United States
Yogesh Pandit500 wrote:

You an download all the SDF files for the latest PubChem Substance release using any FTP client from

ftp://ftp.ncbi.nlm.nih.gov/pubchem/Substance/CURRENT-Full/SDF/

Then you can using ChemAxon's JChem manager to simply import all the SDFs into a MySQL database.

http://www.chemaxon.com/jchem/doc/admin/

ADD COMMENTlink modified 13 months ago by RamRS30k • written 8.9 years ago by Yogesh Pandit500

ChemAxon is commercial software, and with their academic license you are not allowed to create "shared databases"...

ADD REPLYlink written 4.5 years ago by ostrokach290
0
gravatar for ostrokach
4.5 years ago by
ostrokach290
Canada
ostrokach290 wrote:

There is only one way to import large amounts of data into a standard database (i.e. MySQL, PostgreSQL, etc.):

  • process the data to create CSV files that your database can understand
  • use the LOAD DATA LOCAL INFILE ... command (or equivalent) to load those files into the database

If your data comes as XML (:o), you have to process that data using an XML library like lxml in Python, and create CSV files that contain all the information that you need. "XSD to SQL" converters don't work with complex schema that most XML files contain, and XML databases (e.g. BaseX, eXist) are immature and have limits on the size of the files that you can import.

The same applies to SDF files.

ADD COMMENTlink modified 4.5 years ago • written 4.5 years ago by ostrokach290
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1823 users visited in the last hour