How to download list of compounds from pubchem database using command line
2
1
Entering edit mode
7.9 years ago
krishavc92 ▴ 10

How to download list of compounds from pubchem database using command line perl?

software-error alignment • 16k views
ADD COMMENT
0
Entering edit mode

You can find pubchem data on NCBI's FTP site. Not sure if you are looking for just a list of names.

ADD REPLY
0
Entering edit mode

Thanks for your kind suggestion . i dont want all compounds. i have list of compound that only i need to download using command line

ADD REPLY
0
Entering edit mode

I listed the online methods of string search in my answer. genomax is providing you the database of the metadata. If you are interested in programmatic way please try to take the metadata file and scan through it with your input list of compound IDs in an array and where ever you find a hit retrieve all the information and store it in an output file in tab delimited format. You can append the same thing for all the other compound as well. This should be algorithmic approach for your perl script. Anything can be used in Linux , like bash, python, perl whichever you are comfortable with. If you are looking for programmatic solution we request you to come up with your code and let us know where you are wrong so that experts can help else stackoverflow is there always to give your wonderful programmatic suggestions and solutions.

You can also look at this thread which shows how to mirror the DB and work with it.

ADD REPLY
0
Entering edit mode

Please remove the redundant question in this link

You have to wait patiently for others to take a look and then reply having bandwidth from their work.

ADD REPLY
0
Entering edit mode
7.9 years ago
ivivek_ngs ★ 5.2k

You can check

  1. This link from PubChem which works on compound IDs and shows how to download.
  2. Alternatively take a look at ChemMineTools
  3. Then there is ligdig you can use batch search gets hits from both ChEMBL and NCBI PubChem

The above links should work for you for the work you want to do.

A simple google search does not hurt , it only enriches your data mining skills. We are here to guide and also impart learning so that you can help others in future.

ADD COMMENT
0
Entering edit mode

Thanks for your answer i already checked first and second .I did not got anything else is there.

ADD REPLY
0
Entering edit mode

Then please modify the question about what you are trying to achieve so that we can get insight about what exactly you want. Did you see the 3rd link? What all information do you want to retrieve? Have you seen that there is also a download CSV option in Pubchem link I gave. This shows how to use the resource for list of compounds and download using the PubChem service. I have also commented above below genomax reply on your question about a thread link which is in line with your query. Please take a look into that. If these does not meet your requirements request you to update the question with more details as what your requirement is so that we can take a look and come up with more targeted solution.

ADD REPLY
0
Entering edit mode

I have around 500 list compounds those compounds i need to download sdf format.Now you understand what i am exactly want.Can you help me.

ADD REPLY
1
Entering edit mode

@vchris_ngs has already given you pointers on how to do this. Combine that information with the FTP link I had posted.

ADD REPLY
0
Entering edit mode

Enough been said I guess with detailed pointers, the OP needs result which needs time for us to invest, in that case the code needs to be shown I believe. Else much details about redundant threads have been provided for starting which the OP can utilize and come up with a programmatic approach. This is my viewpoint. If any other expert can answer for more targeted and precise result to help OP the question is open.

ADD REPLY
0
Entering edit mode

Take a look at this thread which has some useful answers that you might take a look and try to implement it in your system.

Parsing Pubchem Compound Records - Take a look at all the 3 answers which is detailed in explanation about how to work with the metadata file to retrieve information. Secondly there is a mention of a tool PubCouch. See if you can exploit it for your own use.

Given Several Compound Reference Numbers, How To Get The Molecular Files -Here also you see some approaches mentioned.

You have to come up with your own algorithm to perform the task, either with any scripting language or with SQL querying the master table of the SDF file and relevant input parameters that can retrieve much needed information for your list of compounds and fetch the relevant information from the big SDF file and produce an output SDF file with all your informations asked for.

ADD REPLY
0
Entering edit mode
2.1 years ago

If you have a list of CIDs in the txt file, then you can use this for the loop.

for id in $(cat pubchem_ids); do wget -O ${id}.sdf https://pubchem.ncbi.nlm.nih.gov/rest/pug/compound/cid/${id}/SDF;done
ADD COMMENT

Login before adding your answer.

Traffic: 1628 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6