Question: Obtaining A Dump Of Pubmed Abstracts Based On Keywords
gravatar for ruchiksy
7.0 years ago by
ruchiksy50 wrote:


I am new to text mining and am looking for a way to obtain a dump of pubmed abstracts based on a keyword input. I would like to do this in Python since I have some experience with that. I would appreciate a sample script template from where I can continue expanding.


----------------------------------------- E D I T -------------------------------------------------

I have a list of pubmed id's and am wondering how to incorporate them into a python script which will then fetch it's corresponding abstract.

ADD COMMENTlink modified 3.1 years ago by Naomi Most60 • written 7.0 years ago by ruchiksy50

search biostar forNCBI Eutilities & pubmed

ADD REPLYlink written 7.0 years ago by Pierre Lindenbaum129k
gravatar for Naomi Most
3.1 years ago by
Naomi Most60
Naomi Most60 wrote:

You can do this pretty easily using metapub.

Let's say you're interested in every abstract pertaining to "breast neoplasm":

from metapub import PubMedFetcher
fetch = PubMedFetcher()

# get the first 1000 pmids matching "breast neoplasm" keyword search
pmids = fetch.pmids_for_query('breast neoplasm', retmax=1000)

# get abstract for each article:
abstracts = {}
for pmid in pmids:
    abstracts[pmid] = fetch.article_by_pmid(pmid).abstract

That's it. :)

There are a lot of knobs and dials in the PubMedFetcher.pmids_for_query function. Notably, it supports all of the PubMed query keywords (e.g. journal name as "TA"), and also allows specifying pmc_only (boolean) so you can restrict just to PMC if you like.

ADD COMMENTlink written 3.1 years ago by Naomi Most60

I have installed metapub (pip install metapub== but when I tried to use the code I obtained the error: ModuleNotFoundError: No module named 'metapub'. Any idea about what is wrong?. I'm a beginner in Python

ADD REPLYlink written 2.9 years ago by leireahedo020

sir now I got all abstracts belongs to the breast neoplasm from this data how can I extract what are all the gene name avilable in the abstracts. I want to extract gene names form the abstracts.

ADD REPLYlink written 16 months ago by venkatarao1421520

Hi there -- if you want to pull all the gene names out of the abstracts, the easiest way would be to download a comprehensive list of gene names from the HGNC or using NCBI's Gene database.

There are several ways to do either of those things. The simplest way, getting a list of gene names, just means you insert all of the gene names into a hash or a set, and then see if you can match the gene names in each of the abstracts.

ADD REPLYlink written 8 months ago by Naomi Most60
gravatar for aravind ramesh
7.0 years ago by
aravind ramesh520 wrote:

Refsesnse will Help you.

ADD COMMENTlink modified 7.0 years ago • written 7.0 years ago by aravind ramesh520

I would like to do the same but by using a script on my terminal which accepts keywords as inputs and then fetches the info from the pubmed website.

ADD REPLYlink written 7.0 years ago by ruchiksy50

See the new edit. :). If your problem is solved, accept the answer by checking the tick mark on the left of the answer.

ADD REPLYlink modified 7.0 years ago • written 7.0 years ago by aravind ramesh520

Refsense looks like it may help me. Thanks!

ADD REPLYlink written 7.0 years ago by ruchiksy50
gravatar for lcordeiro
4.3 years ago by
Rio de Janeiro / National Cancer Institute (INCA)
lcordeiro30 wrote:

I've just went through the same process, though I'm not using Python. This is what I did: I run a normal PUBMED query, saved the PUBMED ID of all results in a text file and went through Nakao's solution here: Getting Tab-Delimited Pmids And Abstracts From Pubmed

I've pasted my modified version of his bash script below, if it helps. This version writes a counter, the PUBMED ID and the abstract associated with it to standard output (you'll need to redirect the output to your output file). The PUBMEDid's are in the pmid01.txt file. Some PUBMED ID's don't have an abstract, you'll need to dig in the article's text manually for completeness.

#Retrieve abstracts from PUBMEDid list
for i in `cat pmid01.txt`;
do echo -n "$count";
ruby -e 'print "\t"';
echo -n $i;
ruby -e 'print "\n"';
curl "$i/abstract";
ruby -e 'print "\n"';
ADD COMMENTlink modified 4.3 years ago • written 4.3 years ago by lcordeiro30
gravatar for reachtoskumar
6.6 years ago by
reachtoskumar10 wrote:

You can also give a try to BioGyan ( It is a comprehensive search tool specially designed for biologists, enabling search, annotation and ranking of scientific literature from public databases. You serached result will be saved on your local machine..

ADD COMMENTlink written 6.6 years ago by reachtoskumar10
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2071 users visited in the last hour