Question: Retrieve PubMed records based on some genes
gravatar for willson
2.6 years ago by
willson10 wrote:


I am trying to extract PubMed records via Bio Python library based one some gene names (e.g. all pmids which contains these gene names in their Abstracts). I wrote the following code and it is returning some results, but I am not sure that It is working correctly. I am wondering whether this code is going to miss some articles that contain similar gene Symbols (e.g. P53 for TP53) or Synonyms of them or not. And also, can I trust to PubMed filtering with this approach or I should get all of the abstracts and manually search/filter them.

handle = Entrez.esearch(db="pubmed", term="TP53[gene] AND BRCA1[gene] AND CXCL12[gene] ")
record =
idlist = record["IdList"]
handle = Entrez.efetch(db="pubmed", id=idlist, rettype="medline", retmode="text") # See medline format table
records = Medline.parse(handle)
records = list(records)
for record in records:
    print("title:", record.get("TI", "?"))
    print("authors:", record.get("AU", "?"))
    print("source:", record.get("SO", "?"))
    print("Abstract", record.get("AB","?")) #Abstracts
pubmed biopython python gene • 1.4k views
ADD COMMENTlink modified 2.4 years ago by Maria_Levchenko60 • written 2.6 years ago by willson10

I am going to make some general comments.

You will want to use OR instead of AND in your terms since I don't get any hits with all three genes in the example above with AND when using NCBI eUtils. A ton of hits appear, if the terms are used individually or combined with OR. What is your ultimate aim in doing this since there must be a lot of records in pubmed with these terms.

My search was done using:

esearch -db pubmed -query "TP53[gene] OR BRCA1[gene] OR CXCL12[gene]" | efetch -format abstract
ADD REPLYlink modified 2.6 years ago • written 2.6 years ago by GenoMax94k
gravatar for Maria_Levchenko
2.4 years ago by
Maria_Levchenko60 wrote:

If you would like to retrieve PubMed records containing a certain gene name and its aliases/synonyms you can use Europe PMC RESTful API search module with synonym parameter set to true. This will expand your query to include synonyms found in MeSH vocabulary and the UniProt synonyms list. Meaning that if you search for p53, you will also retrieve TP53, TRP53, pp53 (for phosphorylated p53). Here is an example query in JSON: (*&pageSize=25&format=json).

Alternatively, you can also retrieve articles that contain gene mentions using Annotations API. This is based on public text-mining data available programmatically in Europe PMC. Gene names is one of the text-mined entities, and you can retrieve a list of all articles that contain a specific gene name in the abstract or full text. The text-mining ensures that all synonyms are taken into account. You can try it out here:!/annotations45api45controller/getAnnotationsArticlesByEntityUsingGET. Input the name of the entity (in your case - p53), and select the output format. If you only want a list of PMIDs, go with ID_LIST. An example would be this search for p53:

Disclaimer: I work for Europe PMC

ADD COMMENTlink written 2.4 years ago by Maria_Levchenko60

@Maria: Please consider creating a separate tool post for describing PMC API instead of posting this information in multiple old threads. You can describe full functionality of the tool in one place.

ADD REPLYlink modified 2.4 years ago • written 2.4 years ago by GenoMax94k

Oh, great point. I have not thought about it.

ADD REPLYlink written 2.4 years ago by Maria_Levchenko60
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1272 users visited in the last hour