Question

Retrieving Pubchem Ids

3

Entering edit mode

12.1 years ago

Nitin ▴ 170

Hi all,

I have list of compound names for which i want to retrieve Pubchem CIDs..to acheive this i wrote a biopython script as follows but it doenst seem to working

from Bio import Entrez

Entrez.email = "sainitin7@gmail.com"

infile = open("data", "r")

out_put = open("ids_data.csv","w")

for line in infile.readlines():

  single_id = line

  #Post list of ids to database

  handle= Entrez.epost("pccompound",names=single_id)

  record = Entrez.read(handle)

  #history

  webEnv=record["WebEnv"]

  queryKey=record["QueryKey"]

  #Retreiving information

  data = Entrez.esummary(db="pccompound",webenv=webEnv,query_key=queryKey)

  res=Entrez.read(data)

  for compound in res:    

    Name = compound["SynonymList"]

    Cid = compound["Id"]

    print "%s:%s" %(Name,Cid)

    out_put.write("%s:%s\n" %(Name,Cid))

out_put.close()

Ideally i want a output as follows

Biruvidine : 446727

Can any body help

Thanks in advance

Nit

biopython entrez ncbi • 5.4k views

ADD COMMENT • link updated 12.1 years ago by Peter 6.0k • written 12.1 years ago by Nitin ▴ 170

0

Entering edit mode

Can you fix the formatting? The example is very hard to read, and you didn't show the current output. It sounds like given an PubChem identifier like SID 74891762 you want to get back 'Brivudine: CID446727' - is that right?

ADD REPLY • link 12.1 years ago by Peter 6.0k

0

Entering edit mode

Can you fix the formatting? The example is very hard to read.

ADD REPLY • link 12.1 years ago by Peter 6.0k

0

Entering edit mode

And what do you mean by "it doenst seem to working"? What is the error message, if any?

ADD REPLY • link 12.1 years ago by Neilfws 49k

0

Entering edit mode

Thanks for fixing the formatting. Could you also include an example of the text in ids_data.csv so we have both sample input AND the desired output?

ADD REPLY • link 12.1 years ago by Peter 6.0k

score 1 · Answer 1 · 2012-03-23

a) You are not using the right tools. Here is a simpler and more robust solution using the Cactvs Chemoinformatics toolkit www.xemistry.com/academic for free academic version):

foreach name [split [string trim [read_file data]] "\n"] {
        if {[catch {ens create $name} eh]} {
                puts "$name : not resolved"
        } elseif {[catch {ens get $eh E_CID} cid]} {
                puts "$name: no CID"
                ens delete $eh
        } else {
                puts "$name: $cid"
                ens delete $eh
        }
}

b) Even this script does not work with "Biruvidine". Because the proper name of that compound is "Brivudine". The correct name resolves easily.

Interactive lookup of the name set for a CID:

cactvs>ens create CID446727
ens0
cactvs>ens get ens0 E_NAMESET
{5-[(E)-2-bromoethenyl]-1-[(2R,4S,5R)-4-hydroxy-5-(hydroxymethyl)oxolan-2-yl]pyrimidine-2,4-dione} {5-[(E)-2-bromovinyl]-1-[(2R,4S,5R)-4-hydroxy-5-(hydroxymethyl)tetrahydrofuran-2-yl]pyrimidine-2,4-dione} {5-[(E)-2-bromovinyl]-1-[(2R,4S,5R)-4-hydroxy-5-(hydroxymethyl)-2-tetrahydrofuranyl]pyrimidine-2,4-dione} {5-[(E)-2-bromovinyl]-1-[(2R,4S,5R)-4-hydroxy-5-methylol-tetrahydrofuran-2-yl]pyrimidine-2,4-quinone} 69304-47-8 (E)-5-(2-Bromovinyl)-2'-deoxyuridine (E)-5-(2-Bromovinyl)-deoxyuridine BVDU {Brivudina [INN-Spanish]} Brivudine {Brivudine [INN]} {Brivudinum [INN-Latin]} {CCRIS 2831} Helpin {NSC 633770} {Uridine, 5-(2-bromoethenyl)-2'-deoxy-, (E)-} {Uridine, 5-(2-bromovinyl)-2'-deoxy-, (E)-} trans-5-(2-Bromovinyl)-2'-deoxyuridine Lopac0_000175 (E)-5-(2-Bromovinyl)-dUrd AIDS-070967 AIDS070967 BV-dUrd BrVdUrd Brivudin UA-618 EU-0100175 A-176 5-BROMOVINYLDEOXYURIDINE BVD Bromovinyldeoxyuridine RP-101 Zostex

score 1 · Answer 2 · 2012-03-23

1

Entering edit mode

12.1 years ago

Peter 6.0k

I would have expected to do this with EFetch, but the NCBI don't seem to support this database with EFetch http://eutils.ncbi.nlm.nih.gov/corehtml/query/static/efetch_help.html

Here's how I would do it for one ID, a PubChem identifier like CID 446727 - you can of course generalise this to read the IDs from a file etc and use epost and the history as you were above.

from Bio import Entrez
Entrez.email = "sainitin7@gmail.com"
record = Entrez.read(Entrez.esummary(db="pccompound", id="446727", retmode="xml"))
for entry in record:
    print entry['SynonymList']

I presume that the first synonym is the one you want. In this case the list you get back is:

['Brivudine', 'BVDU', 'Helpin', 'Brivudin', "(E)-5-(2-Bromovinyl)-2'-deoxyuridine", 'Bromovinyldeoxyuridine', 'Brivudinum [INN-Latin]', 'Brivudina [INN-Spanish]', 'CCRIS 2831', '69304-47-8', 'Brivudine (INN)', 'Brivudine [INN]', "Uridine, 5-(2-bromoethenyl)-2'-deoxy-, (E)-", '(E)-5-(2-Bromovinyl)-deoxyuridine', 'NSC 633770', "trans-5-(2-Bromovinyl)-2'-deoxyuridine", "Uridine, 5-(2-bromovinyl)-2'-deoxy-, (E)-", 'Zostex', 'BVD', '5-BVDU', 'E-5-(2-bromovinyl)-dUrd', 'Z-5-(2-bromovinyl)-dUrd', 'Brivudinum', 'Brivudina', 'BrVdUrd', 'NSC633770', 'BV-dUrd', "5-(2-bromovinyl)-2'-deoxyuridine", 'Bromvinyldesoxyuridin', "5-(2-bromoethenyl)-2'-deoxyuridine", 'Zostex (TN)', "(Z)-5-(2-bromovinyl)-2'-deoxyuridine", 'Lopac0_000175', 'C11H13BrN2O5', 'CHEMBL31634', '5-BROMOVINYLDEOXYURIDINE', 'AC1L9K12', '(E)-5-(2-Bromovinyl)-dUrd', 'UNII-2M3055079H', 'RP-101', 'UA-618', 'CCG-204270', 'NCGC00093656-01', 'NCGC00093656-02', 'NCGC00093656-03', 'LS-160809', 'A-176', 'EU-0100175', 'B 9647', 'D07249', '5-[(E)-2-bromoethenyl]-1-[(2R,4S,5R)-4-hydroxy-5-(hydroxymethyl)oxolan-2-yl]pyrimidine-2,4-dione']

ADD COMMENT • link 12.1 years ago by Peter 6.0k

1

Entering edit mode

Just download the complete CID-synonym file from the Compound "Extras" folder and grep for the name and all CIDs associated. Or use eUtils which is supported.

ADD REPLY • link 12.1 years ago by Evan Bolton ▴ 50

0

Entering edit mode

I think this posts answers the reverse of the question - going from CID to name set. The problem was getting the CID from a name.

ADD REPLY • link 12.1 years ago by Wolf Ihlenfeldt ▴ 150

0

Entering edit mode

You're probably right. It would have helped it the question included an example input as well as the hoped for output.

ADD REPLY • link 12.1 years ago by Peter 6.0k

0

Entering edit mode

The basic idea is correct, though, i.e. use esummary.

ADD REPLY • link 12.1 years ago by Neilfws 49k

0

Entering edit mode

Starting with a name, use esearch?

ADD REPLY • link 12.1 years ago by Peter 6.0k