Obtaining A Dump Of Pubmed Abstracts Based On Keywords
4
2
Entering edit mode
10.8 years ago
ruchiksy ▴ 50

Hello,

I am new to text mining and am looking for a way to obtain a dump of pubmed abstracts based on a keyword input. I would like to do this in Python since I have some experience with that. I would appreciate a sample script template from where I can continue expanding.

Thanks,

----------------------------------------- E D I T -------------------------------------------------

I have a list of pubmed id's and am wondering how to incorporate them into a python script which will then fetch it's corresponding abstract.

• 11k views
ADD COMMENT
1
Entering edit mode

search biostar forNCBI Eutilities & pubmed

ADD REPLY
3
Entering edit mode
6.9 years ago
Naomi Most ▴ 60

You can do this pretty easily using metapub.

https://pypi.python.org/pypi/metapub

Let's say you're interested in every abstract pertaining to "breast neoplasm":

from metapub import PubMedFetcher
fetch = PubMedFetcher()

# get the first 1000 pmids matching "breast neoplasm" keyword search
pmids = fetch.pmids_for_query('breast neoplasm', retmax=1000)

# get abstract for each article:
abstracts = {}
for pmid in pmids:
    abstracts[pmid] = fetch.article_by_pmid(pmid).abstract

That's it. :)

There are a lot of knobs and dials in the PubMedFetcher.pmids_for_query function. Notably, it supports all of the PubMed query keywords (e.g. journal name as "TA"), and also allows specifying pmc_only (boolean) so you can restrict just to PMC if you like.

ADD COMMENT
0
Entering edit mode

I have installed metapub (pip install metapub==0.4.3.6) but when I tried to use the code I obtained the error: ModuleNotFoundError: No module named 'metapub'. Any idea about what is wrong?. I'm a beginner in Python

ADD REPLY
0
Entering edit mode

sir now I got all abstracts belongs to the breast neoplasm from this data how can I extract what are all the gene name avilable in the abstracts. I want to extract gene names form the abstracts.

ADD REPLY
0
Entering edit mode

Hi there -- if you want to pull all the gene names out of the abstracts, the easiest way would be to download a comprehensive list of gene names from the HGNC or using NCBI's Gene database.

There are several ways to do either of those things. The simplest way, getting a list of gene names, just means you insert all of the gene names into a hash or a set, and then see if you can match the gene names in each of the abstracts.

ADD REPLY
1
Entering edit mode
10.8 years ago

Refsesnse will Help you.

ADD COMMENT
0
Entering edit mode

I would like to do the same but by using a script on my terminal which accepts keywords as inputs and then fetches the info from the pubmed website.

ADD REPLY
0
Entering edit mode

See the new edit. :). If your problem is solved, accept the answer by checking the tick mark on the left of the answer.

ADD REPLY
0
Entering edit mode

Refsense looks like it may help me. Thanks!

ADD REPLY
1
Entering edit mode
8.1 years ago
lcordeiro ▴ 40

I've just went through the same process, though I'm not using Python. This is what I did: I run a normal PUBMED query, saved the PUBMED ID of all results in a text file and went through Nakao's solution here: Getting Tab-Delimited Pmids And Abstracts From Pubmed

I've pasted my modified version of his bash script below, if it helps. This version writes a counter, the PUBMED ID and the abstract associated with it to standard output (you'll need to redirect the output to your output file). The PUBMEDid's are in the pmid01.txt file. Some PUBMED ID's don't have an abstract, you'll need to dig in the article's text manually for completeness.

#Retrieve abstracts from PUBMEDid list
count=1;
for i in `cat pmid01.txt`;
do echo -n "$count";
ruby -e 'print "\t"';
echo -n $i;
ruby -e 'print "\n"';
curl "http://togows.dbcls.jp/entry/ncbi-pubmed/$i/abstract";
ruby -e 'print "\n"';
((count++));
done
ADD COMMENT
0
Entering edit mode
10.4 years ago

You can also give a try to BioGyan (http://www.biogyan.com/). It is a comprehensive search tool specially designed for biologists, enabling search, annotation and ranking of scientific literature from public databases. You serached result will be saved on your local machine..

ADD COMMENT

Login before adding your answer.

Traffic: 2592 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6