Question: How to download list of compounds from pubchem database using command line
0
gravatar for krishavc92
4.3 years ago by
krishavc920
University of Madras, Chennai
krishavc920 wrote:

How to download list of compounds from pubchem database using command line perl ??

help alignment software error • 8.0k views
ADD COMMENTlink modified 4.3 years ago by ivivek_ngs5.0k • written 4.3 years ago by krishavc920

You can find pubchem data on NCBI's FTP site. Not sure if you are looking for just a list of names.

ADD REPLYlink written 4.3 years ago by genomax91k

Thanks for your kind suggestion . i dont want all compounds. i have list of compound that only i need to download using command line

ADD REPLYlink written 4.3 years ago by krishavc920

I listed the online methods of string search in my answer. genomax is providing you the database of the metadata. If you are interested in programmatic way please try to take the metadata file and scan through it with your input list of compound IDs in an array and where ever you find a hit retrieve all the information and store it in an output file in tab delimited format. You can append the same thing for all the other compound as well. This should be algorithmic approach for your perl script. Anything can be used in Linux , like bash, python, perl whichever you are comfortable with. If you are looking for programmatic solution we request you to come up with your code and let us know where you are wrong so that experts can help else stackoverflow is there always to give your wonderful programmatic suggestions and solutions.

You can also look at this thread which shows how to mirror the DB and work with it.

ADD REPLYlink modified 4.3 years ago • written 4.3 years ago by ivivek_ngs5.0k

Please remove the redundant question in this link

You have to wait patiently for others to take a look and then reply having bandwidth from their work.

ADD REPLYlink written 4.3 years ago by ivivek_ngs5.0k
0
gravatar for ivivek_ngs
4.3 years ago by
ivivek_ngs5.0k
Seattle,WA, USA
ivivek_ngs5.0k wrote:

You can check

  1. This link from PubChem which works on compound IDs and shows how to download.
  2. Alternatively take a look at ChemMineTools
  3. Then there is ligdig you can use batch search gets hits from both ChEMBL and NCBI PubChem

The above links should work for you for the work you want to do.

A simple google search does not hurt , it only enriches your data mining skills. We are here to guide and also impart learning so that you can help others in future.

ADD COMMENTlink written 4.3 years ago by ivivek_ngs5.0k

Thanks for your answer i already checked first and second .I did not got anything else is there.

ADD REPLYlink written 4.3 years ago by krishavc920

Then please modify the question about what you are trying to achieve so that we can get insight about what exactly you want. Did you see the 3rd link? What all information do you want to retrieve? Have you seen that there is also a download CSV option in Pubchem link I gave. This shows how to use the resource for list of compounds and download using the PubChem service. I have also commented above below genomax reply on your question about a thread link which is in line with your query. Please take a look into that. If these does not meet your requirements request you to update the question with more details as what your requirement is so that we can take a look and come up with more targeted solution.

ADD REPLYlink modified 4.3 years ago • written 4.3 years ago by ivivek_ngs5.0k

I have around 500 list compounds those compounds i need to download sdf format.Now you understand what i am exactly want.Can you help me.

ADD REPLYlink written 4.3 years ago by krishavc920
1

@vchris_ngs has already given you pointers on how to do this. Combine that information with the FTP link I had posted.

ADD REPLYlink written 4.3 years ago by genomax91k

Enough been said I guess with detailed pointers, the OP needs result which needs time for us to invest, in that case the code needs to be shown I believe. Else much details about redundant threads have been provided for starting which the OP can utilize and come up with a programmatic approach. This is my viewpoint. If any other expert can answer for more targeted and precise result to help OP the question is open.

ADD REPLYlink written 4.3 years ago by ivivek_ngs5.0k

Take a look at this thread which has some useful answers that you might take a look and try to implement it in your system.

Parsing Pubchem Compound Records - Take a look at all the 3 answers which is detailed in explanation about how to work with the metadata file to retrieve information. Secondly there is a mention of a tool PubCouch. See if you can exploit it for your own use.

Given Several Compound Reference Numbers, How To Get The Molecular Files -Here also you see some approaches mentioned.

You have to come up with your own algorithm to perform the task, either with any scripting language or with SQL querying the master table of the SDF file and relevant input parameters that can retrieve much needed information for your list of compounds and fetch the relevant information from the big SDF file and produce an output SDF file with all your informations asked for.

ADD REPLYlink written 4.3 years ago by ivivek_ngs5.0k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1306 users visited in the last hour