Question: Searching For Organism Mentions In Pubmed
4
gravatar for Will
7.5 years ago by
Will4.5k
United States
Will4.5k wrote:

As a proxy for a database of host-pathogen pairs mentioned I asked about here I was thinking about using organism co-mentions in Pubmed articles. I can easily make a script that retrieves all of the PMIDS that result from a search for every organism name in NCBI-phylogeny.

However, I was wondering if anyone knew of a more 'elegant' way of doing this. Does anyone know of an NLP tool for searching organism mentions in Pubmed ... or perhaps a database which annotates these?

Thanks.

• 1.9k views
ADD COMMENTlink written 7.5 years ago by Will4.5k
3
gravatar for Nathan Harmston
7.5 years ago by
Nathan Harmston1.1k
London
Nathan Harmston1.1k wrote:

So one way of doing this might be to download all of MEDLINE and then try running a species tagger on the abstracts.

Two recent species NER taggers

OrganismTagger http://www.semanticsoftware.info/organism-tagger

LINNAEUS http://linnaeus.sourceforge.net/

Both of these systems identify species mentions in text and then map then to NCBI taxonomy identifiers. Each of these systems has their own issues and biases, and basically apply regular expression matching over the text (OrganismTagger uses a SVM to identify strain information). While performance is decent, you are obviously going to be a lot of false positive and false negatives. HTH

ADD COMMENTlink modified 7.5 years ago by Casey Bergman18k • written 7.5 years ago by Nathan Harmston1.1k
3
gravatar for Hamish
7.5 years ago by
Hamish3.1k
UK
Hamish3.1k wrote:

Not sure if they cover exactly what your are looking for, but the Rebholz Group at EMBL-EBI provide a number of services which may be relevant:

  • Whatizit: a text processing system which features a range of pipelines (see http://www.ebi.ac.uk/webservices/whatizit/info.jsf) for identifing a wide range of biologically relevant terms. In your case the "whatizitOrganisms", "whatizitOrganismsFilter" and "whatizitUkPmcSpecies" pipelines which mark-up NCBI Taxonomy related terms are likely to be of interest.

  • EBIMed: a text-mining aware search engine for the MEDLINE data. Whatizit provides annotations for use in EBIMed, including the "whatizitOrganisms" pipeline results.

Text-mining results from Whatizit are also integrated into CiteXplore and UK PubMed Central (UKPMC). You may want to look at CiteXplore and UKPMC for additional coverage of the literature, since these contain more than just MEDLINE/PubMed, see http://www.ebi.ac.uk/citexplore/showStatistics.do and http://ukpmc.ac.uk/FAQ#searchon for details of the additional sources.

ADD COMMENTlink modified 7.5 years ago • written 7.5 years ago by Hamish3.1k

You'd think after using Whatizit daily for the past ~7 months I would know that it had an organism tagger! But I guess that's what happens when your 'work blinders' are on ;)

ADD REPLYlink written 7.5 years ago by Will4.5k

I would not reccommend using Whatizit for organism tagging (see issues and performance evaluation here: http://www.biomedcentral.com/1471-2105/11/85) @Nathan's suggestions show much better performance.

ADD REPLYlink written 7.5 years ago by Casey Bergman18k
2
gravatar for Casey Bergman
7.5 years ago by
Casey Bergman18k
Athens, GA, USA
Casey Bergman18k wrote:

We have parsed MEDLINE 2011 baseline files for organism mentions with LINNAEUS. These data are available here: http://biocontext.smith.man.ac.uk/data/entities-species.csv.gz

For more information see http://biocontext.org/

ADD COMMENTlink written 7.5 years ago by Casey Bergman18k
1
gravatar for boczniak767
7.5 years ago by
boczniak767640
Poland
boczniak767640 wrote:

You could also try elise It looks for co-occurences of words/phrases in PubMed

paper here

ADD COMMENTlink written 7.5 years ago by boczniak767640
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1069 users visited in the last hour