Question: web efetch Json out put which is not a real json
0
gravatar for Lilizine
3.0 years ago by
Lilizine0
Lilizine0 wrote:

Hi all``

I am using this code to download pubmed articles

search_url = "https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?
db=pubmed&mindate=2010/01/01&maxdate=2016/12/31&usehistory=y&retmode=json"
search_r = requests.post(search_url)
search_data = search_r.json()
webenv = search_data["esearchresult"]['webenv']
total_records = int(search_data["esearchresult"]['count'])
fetch_url = "https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?
db=pubmed&retmax=9999&query_key=1&webenv="+webenv

     for i in range(0, total_records, 10000):
     this_fetch = fetch_url+"&retstart="+str(i)
     print("Getting this URL: "+this_fetch)
     fetch_r = requests.post(this_fetch)
     f = open('pubmed_batch_'+str(i)+'_to_'+str(i+9999)+".json", 'w')
     f.write(fetch_r.text)
     f.close()

I want to have output in XML not in json, the problem is when I want to do this :

page = urllib.urlopen('one of the URLs')
content = page.read()
obj = json.loads(content)
xml = dicttoxml.dicttoxml(content)
print(xml)

I have this error:

No JSON object could be decoded

Ideally If I can extract XML ? and avoid the json outputs which are not recognized as json

PS: ignore the idendation issues due to copy paste

Thanks

pubmed biopython json efetch python • 1.7k views
ADD COMMENTlink modified 20 months ago by MatthewP740 • written 3.0 years ago by Lilizine0

I don't understand the issue here. If you want xml output from NCBI, just use retmode=xml in the url.

ADD REPLYlink written 3.0 years ago by Jean-Karim Heriche23k

Already did this and it returns an error

ADD REPLYlink written 3.0 years ago by Lilizine0

What does return an error ? What's the error message ?

ADD REPLYlink written 3.0 years ago by Jean-Karim Heriche23k

I changed the following:

search_url = "https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?
db=pubmed&mindate=2010/01/01&maxdate=2016/12/31&usehistory=y&retmode=xml"
search_r = requests.post(search_url)
search_data = search_r.xml()
webenv = search_data["esearchresult"]['webenv']
total_records = int(search_data["esearchresult"]['count'])
fetch_url = "https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?
db=pubmed&retmax=9999&query_key=1&webenv="+webenv

for i in range(0, total_records, 10000):
this_fetch = fetch_url+"&retstart="+str(i)
print("Getting this URL: "+this_fetch)
fetch_r = requests.post(this_fetch)
f = open('pubmed_batch_'+str(i)+'_to_'+str(i+9999)+".txt", 'w')
f.write(fetch_r.text)
f.close()

print("Number of records found :"+str(total_records))

I got this error:

AttributeError Traceback (most recent call last)

<ipython-input-8-262417e4aa63> in <module>()

  1 search_url = "https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?

db=pubmed&mindate=2010/01/01&maxdate=2016/12/31&usehistory=y&retmode=xml"

  2 search_r = requests.post(search_url)

----> 3 search_data = search_r.xml()

4 webenv = search_data["esearchresult"]['webenv']

  5 total_records = int(search_data["esearchresult"]['count'])

AttributeError: 'Response' object has no attribute 'xml'

ADD REPLYlink modified 3.0 years ago • written 3.0 years ago by Lilizine0

I don't know which programming language you're using (python maybe ? I don't know python) but this could indicate that your object has no method called xml. Maybe whatever module you're using is not capable of handling xml. The solution is to deal with the xml yourself, maybe with the help of another module. This looks now more like a programming question and not a bioinformatics one. You may have better luck asking on StackOverflow. Alternatively, explain what you're trying to do, i.e. what bioinformatics problem you're trying to solve. There may already be a solution.

ADD REPLYlink written 3.0 years ago by Jean-Karim Heriche23k
2
gravatar for MatthewP
20 months ago by
MatthewP740
China
MatthewP740 wrote:

Hello, you set retmode=json but this may not supported by db=pubmed, this table shows all default and valid retmode for E-utilities all databases.

ADD COMMENTlink modified 20 months ago • written 20 months ago by MatthewP740
0
gravatar for jrdeans
20 months ago by
jrdeans0
jrdeans0 wrote:

So far, eUtils only allows retmode='json' for eSearch queries.

with request.urlopen(search_url) as response:
    content = response.read()
data = json.loads(content)

web = data['esearchresult']['webenv']
key = data['esearchresult']['querykey']
count = int(data['esearchresult']['count'])
ids = data['esearchresult']['idlist']

When you are planning to eFetch 'pubmed' data you have three options for retmode (asn.1, text, xml). Within the text category you can select 3 different kinds of rettypes (medline, uilist, abstract) of which only 'medline' will get you all the data of the publication. Both 'asn.1' and 'xml' retmodes do not have associated rettypes, so you can ignore setting that field in those scenarios [1].

with request.urlopen(fetch_url) as response:
    content = response.read()
soup = BeautifulSoup(content, 'html.parser')

You will be getting real json output for your eSearch results, but HTML/XML for the eFetch call. There currently is no way to get eFetch results in json format. Hopefully NCBI adds this functionality to their other eUtils tools soon!

ADD COMMENTlink modified 20 months ago • written 20 months ago by jrdeans0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1155 users visited in the last hour