Hi all,
first time I post here so, please, be patient. So I am currently working with Entrez (Biopython) in order to retrieve the number of articles for a given disease/indication. My data provide both the indication at level 3 (i.e. at ATC level 3) and the New Clinical trial (NCT) identifier. In order to avoid confusion on the search (i.e. for instance inputting "short term insomnia" gives me different results than inputting "short-term insomnia" and so on), I would like to make the search either by mesh term or by NCT id. Further, I would like to do it from 2004 to 2013. Summarizing: input --> NCT id (as secondary key I guess); Output --> number of articles for that NCT id in 2004, number of articles for that NCT ID in 2005,..., number of articles for that NCT ID in 2013.
For the moment I am focusing on the safest way, which is, using NCT ids as secondary keys, but my code (displayed below) does not seem to perform well:
id_list = ["NCT00714714[SI]","NCT94839294[SI]",..."NCT00714584[SI]"]
years = [2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013]
records = {}
for indication in id_list:
for year in years:
records[(indication, year)] = 0
search_results = {}
count={}
for indication in id_list:
for year in years:
Entrez.email = "*****@domain.com"
#handle = Entrez.efetch(db="pubmed", id=indication, rettype="gb", retmode="xml")
#record = Entrez.read(handle)
#abstract=record['PubmedArticle'][0]['MedlineCitation']['Article']
search_results[(indication, year)] = Entrez.read(Entrez.esearch(db="pubmed",
term=indication,
mindate=year, maxdate=year, datetype="pdat",
usehistory="y"))
count[(indication, year)] = int(search_results[(indication, year)]["Count"])
#records[(indication, year)].append(count[(indication, year)])
records[(indication, year)] = count[(indication, year)]
``` Can someone please help me on this?
Edited by @Joe to redact email address
You need to be more specific about what it is about this code that isn't "performing well". Does this mean it doesn't work? Isn't fast enough? Returns the wrong information?
Hi and thank you for the comment.
so basically, to make it simpler, if I perform something really easy like:
for a single NCT, the output is
so basically a "not found", which sounds very strange to me since the NCT is taken directly from a trial of 2008 in my database.