Pubmed Id From Abstract
3
0
Entering edit mode
8.1 years ago
win ▴ 890

hi all, i have several paper abstracts and not the PubMed IDs. I wanted to know if there is a way to try to get the PubMed ID for that abstract, even if i had a list of probable IDs, then that might work for us.

any thoughts / code / ideas would be much appreciated.

thanks.

pubmed • 3.0k views
ADD COMMENT
1
Entering edit mode

Only abstracts? No titles?

ADD REPLY
0
Entering edit mode

right, we have only abstracts

ADD REPLY
0
Entering edit mode

Did you try endnote? I know we can import references from non-formatted text, once you do that you can get PMCID easily (see this blog http://everythingendnote.blogspot.com/2010/04/importing-pmc-ids-from-pubmed.html)

ADD REPLY
2
Entering edit mode
8.1 years ago
pdrebi ▴ 20

You could use part of the abstract to search Europe PMC. For example just paste in a sentence: "Simplicity has made C. elegans pharyngeal development a particularly well-studied subject." If you have them, you could also use first author, journal name, pub year & vol. A help page on Europe PMC shows you the various search fields available with examples: http://europepmc.org/Help. These search fields can also be used programatically, details are available via Europe PMC Web Services. Take the 'Resources' menu option from the home page.

ADD COMMENT
1
Entering edit mode
8.1 years ago
Pappu ★ 1.9k

You can do it in BioPython: http://biopython.org/DIST/docs/tutorial/Tutorial.html#htoc128

ADD COMMENT
1
Entering edit mode

Would you care to expand a little?

ADD REPLY
0
Entering edit mode
7.6 years ago
Hayssam ▴ 280

That should get you started with a python solution. Note that PUBMED generally does a very good job at interpreting free text query; but some of your abstracts might still need some pre/post processing. Depending on the accuracy you target, you might need to compare the abstracts corresponding to the IDs matching your query, and decide whether the two are "close enough". The Difflib python module might help you. Note also that you are only allowed a limited number of queries per second. If you have a large study to perform (say 1million record), try to get a mirror of the MEDLINE db and perform the matches locally. Let us know if you need more help for the pre/post processing step.

Before using the script, please fill the Entrez.email variable accordingly. PUBMED admin might need to contact you if you go over fair usage of the db (instead of blocking the IP !)

from Bio import Entrez,Medline
an_abstract = """
Background
Uncovering the relationship between the conserved chromosomal segments and the functional relatedness of elements within these segments is an important question in computational genomics. We build upon the series of works on gene teams and homology teams.

Results
Our primary contribution is a local sliding-window SYNS (SYNtenic teamS) algorithm that refines an existing family structure into orthologous sub-families by analyzing the neighborhoods around the members of a given family with a locally sliding window. The neighborhood analysis is done by computing conserved gene clusters. We evaluate our algorithm on the existing homologous families from the Genolevures database over five genomes of the Hemyascomycete phylum.

Conclusions
The result is an efficient algorithm that works on multiple genomes, considers paralogous copies of genes and is able to uncover orthologous clusters even in distant genomes. Resulting orthologous clusters are comparable to those obtained by manual curation.
"""


Entrez.email = "you@gmail.com"
query_template="""(%s)"""

query=query_template%(an_abstract.replace("."," ").lower())
search_results = Entrez.read(Entrez.esearch(db="pubmed",term=query))

print search_results['IdList']
print "http://www.ncbi.nlm.nih.gov/pubmed/%s"%(search_results['IdList'][0]) # http://www.ncbi.nlm.nih.gov/pubmed/22151970
ADD COMMENT

Login before adding your answer.

Traffic: 2101 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6