Question

NCT seracchi in PubMed via Entrez (python)

0

Entering edit mode

3.7 years ago

federico.nutarelli • 0

Hi all,

first time I post here so, please, be patient. So I am currently working with Entrez (Biopython) in order to retrieve the number of articles for a given disease/indication. My data provide both the indication at level 3 (i.e. at ATC level 3) and the New Clinical trial (NCT) identifier. In order to avoid confusion on the search (i.e. for instance inputting "short term insomnia" gives me different results than inputting "short-term insomnia" and so on), I would like to make the search either by mesh term or by NCT id. Further, I would like to do it from 2004 to 2013. Summarizing: input --> NCT id (as secondary key I guess); Output --> number of articles for that NCT id in 2004, number of articles for that NCT ID in 2005,..., number of articles for that NCT ID in 2013.

For the moment I am focusing on the safest way, which is, using NCT ids as secondary keys, but my code (displayed below) does not seem to perform well:

id_list = ["NCT00714714[SI]","NCT94839294[SI]",..."NCT00714584[SI]"]
years = [2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013] 
records = {}
for indication in id_list:
    for year in years:
      records[(indication, year)] = 0
  search_results = {}
 count={}

for indication in id_list:
  for year in years:
    Entrez.email = "*****@domain.com"
    #handle = Entrez.efetch(db="pubmed", id=indication, rettype="gb", retmode="xml")
    #record = Entrez.read(handle)
    #abstract=record['PubmedArticle'][0]['MedlineCitation']['Article']
    search_results[(indication, year)] = Entrez.read(Entrez.esearch(db="pubmed",
                                        term=indication,
                                        mindate=year, maxdate=year, datetype="pdat",
                                        usehistory="y"))
    count[(indication, year)] = int(search_results[(indication, year)]["Count"])
    #records[(indication, year)].append(count[(indication, year)])
    records[(indication, year)] = count[(indication, year)]

``` Can someone please help me on this?

Edited by @Joe to redact email address

python biopython statistics NCT • 926 views

ADD COMMENT • link updated 3.7 years ago by Joe 21k • written 3.7 years ago by federico.nutarelli • 0

0

Entering edit mode

You need to be more specific about what it is about this code that isn't "performing well". Does this mean it doesn't work? Isn't fast enough? Returns the wrong information?

ADD REPLY • link 3.7 years ago by Joe 21k

0

Entering edit mode

Hi and thank you for the comment.

what it is about this code that isn't "performing well"

so basically, to make it simpler, if I perform something really easy like:

from Bio import Entrez
Entrez.email = "federico.nutarelli@imtlucca.it"
identifier= "NCT00714714[SI]"
handle = Entrez.esearch(db="pubmed", term=identifier, rettype="gb", retmode="xml")
record = Entrez.read(handle)

for a single NCT, the output is

{'Count': '0', 'RetMax': '0', 'RetStart': '0', 'IdList': [], 'TranslationSet': [], 'QueryTranslation': '(NCT00714714[SI])', 'ErrorList':  {'FieldNotFound': [], 'PhraseNotFound': ['NCT00714714[SI]']}, 'WarningList': {'PhraseIgnored': [], 'QuotedPhraseNotFound': [], 'OutputMessage': ['No items found.']}}

so basically a "not found", which sounds very strange to me since the NCT is taken directly from a trial of 2008 in my database.

ADD REPLY • link 3.7 years ago by federico.nutarelli • 0