Question

Obtaining A Dump Of Pubmed Abstracts Based On Keywords

2

Entering edit mode

10.8 years ago

ruchiksy ▴ 50

Hello,

I am new to text mining and am looking for a way to obtain a dump of pubmed abstracts based on a keyword input. I would like to do this in Python since I have some experience with that. I would appreciate a sample script template from where I can continue expanding.

Thanks,

----------------------------------------- E D I T -------------------------------------------------

I have a list of pubmed id's and am wondering how to incorporate them into a python script which will then fetch it's corresponding abstract.

• 11k views

ADD COMMENT • link updated 6.9 years ago by Naomi Most ▴ 60 • written 10.8 years ago by ruchiksy ▴ 50

1

Entering edit mode

search biostar forNCBI Eutilities & pubmed

ADD REPLY • link 10.8 years ago by Pierre Lindenbaum 161k

score 3 · Answer 1 · 2017-05-22

3

Entering edit mode

6.9 years ago

Naomi Most ▴ 60

You can do this pretty easily using metapub.

https://pypi.python.org/pypi/metapub

Let's say you're interested in every abstract pertaining to "breast neoplasm":

from metapub import PubMedFetcher
fetch = PubMedFetcher()

# get the first 1000 pmids matching "breast neoplasm" keyword search
pmids = fetch.pmids_for_query('breast neoplasm', retmax=1000)

# get abstract for each article:
abstracts = {}
for pmid in pmids:
    abstracts[pmid] = fetch.article_by_pmid(pmid).abstract

That's it. :)

There are a lot of knobs and dials in the PubMedFetcher.pmids_for_query function. Notably, it supports all of the PubMed query keywords (e.g. journal name as "TA"), and also allows specifying pmc_only (boolean) so you can restrict just to PMC if you like.

ADD COMMENT • link 6.9 years ago by Naomi Most ▴ 60

0

Entering edit mode

I have installed metapub (pip install metapub==0.4.3.6) but when I tried to use the code I obtained the error: ModuleNotFoundError: No module named 'metapub'. Any idea about what is wrong?. I'm a beginner in Python

ADD REPLY • link 6.7 years ago by leireahedo02 • 0

0

Entering edit mode

sir now I got all abstracts belongs to the breast neoplasm from this data how can I extract what are all the gene name avilable in the abstracts. I want to extract gene names form the abstracts.

ADD REPLY • link 5.1 years ago by venkatarao142152 • 0

0

Entering edit mode

Hi there -- if you want to pull all the gene names out of the abstracts, the easiest way would be to download a comprehensive list of gene names from the HGNC or using NCBI's Gene database.

There are several ways to do either of those things. The simplest way, getting a list of gene names, just means you insert all of the gene names into a hash or a set, and then see if you can match the gene names in each of the abstracts.

ADD REPLY • link 4.4 years ago by Naomi Most ▴ 60

score 1 · Answer 2 · 2013-07-01

1

Entering edit mode

10.8 years ago

aravind ramesh ▴ 540

Refsesnse will Help you.

ADD COMMENT • link 10.8 years ago by aravind ramesh ▴ 540

0

Entering edit mode

I would like to do the same but by using a script on my terminal which accepts keywords as inputs and then fetches the info from the pubmed website.

ADD REPLY • link 10.8 years ago by ruchiksy ▴ 50

0

Entering edit mode

See the new edit. :). If your problem is solved, accept the answer by checking the tick mark on the left of the answer.

ADD REPLY • link 10.8 years ago by aravind ramesh ▴ 540

0

Entering edit mode

Refsense looks like it may help me. Thanks!

ADD REPLY • link 10.8 years ago by ruchiksy ▴ 50

score 1 · Answer 3 · 2016-03-26

I've just went through the same process, though I'm not using Python. This is what I did: I run a normal PUBMED query, saved the PUBMED ID of all results in a text file and went through Nakao's solution here: Getting Tab-Delimited Pmids And Abstracts From Pubmed

I've pasted my modified version of his bash script below, if it helps. This version writes a counter, the PUBMED ID and the abstract associated with it to standard output (you'll need to redirect the output to your output file). The PUBMEDid's are in the pmid01.txt file. Some PUBMED ID's don't have an abstract, you'll need to dig in the article's text manually for completeness.

#Retrieve abstracts from PUBMEDid list
count=1;
for i in `cat pmid01.txt`;
do echo -n "$count";
ruby -e 'print "\t"';
echo -n $i;
ruby -e 'print "\n"';
curl "http://togows.dbcls.jp/entry/ncbi-pubmed/$i/abstract";
ruby -e 'print "\n"';
((count++));
done

score 0 · Answer 4 · 2013-11-26

0

Entering edit mode

10.4 years ago

reachtoskumar ▴ 10

You can also give a try to BioGyan (http://www.biogyan.com/). It is a comprehensive search tool specially designed for biologists, enabling search, annotation and ranking of scientific literature from public databases. You serached result will be saved on your local machine..

ADD COMMENT • link 10.4 years ago by reachtoskumar ▴ 10