Retrieve PubMed records based on some genes
Entering edit mode
3.4 years ago
willson ▴ 10


I am trying to extract PubMed records via Bio Python library based one some gene names (e.g. all pmids which contains these gene names in their Abstracts). I wrote the following code and it is returning some results, but I am not sure that It is working correctly. I am wondering whether this code is going to miss some articles that contain similar gene Symbols (e.g. P53 for TP53) or Synonyms of them or not. And also, can I trust to PubMed filtering with this approach or I should get all of the abstracts and manually search/filter them.

handle = Entrez.esearch(db="pubmed", term="TP53[gene] AND BRCA1[gene] AND CXCL12[gene] ")
record =
idlist = record["IdList"]
handle = Entrez.efetch(db="pubmed", id=idlist, rettype="medline", retmode="text") # See medline format table
records = Medline.parse(handle)
records = list(records)
for record in records:
    print("title:", record.get("TI", "?"))
    print("authors:", record.get("AU", "?"))
    print("source:", record.get("SO", "?"))
    print("Abstract", record.get("AB","?")) #Abstracts
python biopython pubmed gene • 1.6k views
Entering edit mode

I am going to make some general comments.

You will want to use OR instead of AND in your terms since I don't get any hits with all three genes in the example above with AND when using NCBI eUtils. A ton of hits appear, if the terms are used individually or combined with OR. What is your ultimate aim in doing this since there must be a lot of records in pubmed with these terms.

My search was done using:

esearch -db pubmed -query "TP53[gene] OR BRCA1[gene] OR CXCL12[gene]" | efetch -format abstract
Entering edit mode
3.2 years ago

If you would like to retrieve PubMed records containing a certain gene name and its aliases/synonyms you can use Europe PMC RESTful API search module with synonym parameter set to true. This will expand your query to include synonyms found in MeSH vocabulary and the UniProt synonyms list. Meaning that if you search for p53, you will also retrieve TP53, TRP53, pp53 (for phosphorylated p53). Here is an example query in JSON: (*&pageSize=25&format=json).

Alternatively, you can also retrieve articles that contain gene mentions using Annotations API. This is based on public text-mining data available programmatically in Europe PMC. Gene names is one of the text-mined entities, and you can retrieve a list of all articles that contain a specific gene name in the abstract or full text. The text-mining ensures that all synonyms are taken into account. You can try it out here:!/annotations45api45controller/getAnnotationsArticlesByEntityUsingGET. Input the name of the entity (in your case - p53), and select the output format. If you only want a list of PMIDs, go with ID_LIST. An example would be this search for p53:

Disclaimer: I work for Europe PMC

Entering edit mode

@Maria: Please consider creating a separate tool post for describing PMC API instead of posting this information in multiple old threads. You can describe full functionality of the tool in one place.

Entering edit mode

Oh, great point. I have not thought about it.


Login before adding your answer.

Traffic: 2380 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6