Question: Retrieve PubMed records based on some genes
gravatar for willson
4 weeks ago by
willson0 wrote:


I am trying to extract PubMed records via Bio Python library based one some gene names (e.g. all pmids which contains these gene names in their Abstracts). I wrote the following code and it is returning some results, but I am not sure that It is working correctly. I am wondering whether this code is going to miss some articles that contain similar gene Symbols (e.g. P53 for TP53) or Synonyms of them or not. And also, can I trust to PubMed filtering with this approach or I should get all of the abstracts and manually search/filter them.

handle = Entrez.esearch(db="pubmed", term="TP53[gene] AND BRCA1[gene] AND CXCL12[gene] ")
record =
idlist = record["IdList"]
handle = Entrez.efetch(db="pubmed", id=idlist, rettype="medline", retmode="text") # See medline format table
records = Medline.parse(handle)
records = list(records)
for record in records:
    print("title:", record.get("TI", "?"))
    print("authors:", record.get("AU", "?"))
    print("source:", record.get("SO", "?"))
    print("Abstract", record.get("AB","?")) #Abstracts
pubmed biopython python gene • 121 views
ADD COMMENTlink written 4 weeks ago by willson0

I am going to make some general comments.

You will want to use OR instead of AND in your terms since I don't get any hits with all three genes in the example above with AND when using NCBI eUtils. A ton of hits appear, if the terms are used individually or combined with OR. What is your ultimate aim in doing this since there must be a lot of records in pubmed with these terms.

My search was done using:

esearch -db pubmed -query "TP53[gene] OR BRCA1[gene] OR CXCL12[gene]" | efetch -format abstract
ADD REPLYlink modified 4 weeks ago • written 4 weeks ago by genomax52k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1468 users visited in the last hour