Mining Papers On A Desired Topic Based On Certain Criteria
4
5
Entering edit mode
9.8 years ago
Arun 2.4k

I would like to obtain mainly two things. Suppose that the tag (or topic) I'd like to mine is circadian clock

1) I'd like to find out all keywords (where ever possible.. I guess relatively recent papers alone have keywords?) ( or any other equivalent main words) that are associated with this topic. I am interested in creating a tag cloud or word cloud. I think its a cool opening slide in a presentation. What do you guys think?

2) I'd like to mine for pioneer (major findings / breakthroughs) papers in this field (probably more restriction criterions should apply here, such as, humans or plants etc..), if not, most cited papers, from all available papers. This is basically for literature reading. Basically, how does one get across in finding papers that one should definitely have read??

I know how to generate a word cloud in R. I'd like to know if its possible to extract these information somehow from pubmed (using the tm R-package possibly?).

data text r pubmed • 2.2k views
4
Entering edit mode
9.8 years ago

An easy solution would be to search scholar.google.com for "circadian clock" and capture the top X results (X is a number of your choosing). Then, sort these by the "cited N times" field to find papers such as:

[?]Role of the CLOCK protein in the mammalian circadian mechanism[?][?] N Gekakis, D Staknis, HB Nguyen, FC Davis… - Science, 1998 - sciencemag.org[?] [?]Abstract[?] The mouse Clock gene encodes a bHLH-PAS protein that regulates circadian rhythms and is related to transcription factors that act as heterodimers. Potential partners of CLOCK were isolated in a two-hybrid screen, and one, BMAL1, was coexpressed with ...[?] Cited by 915

You could even run a metric like 915 citations / 13 yrs in print = 70 citations/yr.

0
Entering edit mode

This seems nice. I have yet to try if I can get the type of statistic and information I'm hoping to get. Thank you!

2
Entering edit mode
9.8 years ago

check out F1000

They usually have great insight. In fact just browsing the website would be sufficient.

0
Entering edit mode

The two studies to date on the topic actually show that F1000 reviews miss the majority of highly cited papers: http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0005910 http://library.queensu.ca/ojs/index.php/IEE/article/view/2379/2478

0
Entering edit mode

Thanks Zev. It seems to be paid and the studies above doesn't help that point either.

2
Entering edit mode
9.8 years ago

A resource like Arrowsmith has sometimes been helpful for broadening/deepening literature mining expeditions. I often find a lot of interesting connections that are not picked up by PubMed alone. I'm not sure they can give you the "most cited" data, but you could send the list through Google Scholar to get that metric, as Larry suggests above.

0
Entering edit mode

2
Entering edit mode
9.8 years ago

LigerCat: using "MeSH Clouds" from journal, article, or gene citations to facilitate the identification of relevant biomedical literature.

This publication also sounds relevant: A document clustering and ranking system for exploring MEDLINE citations

The LigerCat tag cloud for 'circadian rhythm' is provided below as an example:

0
Entering edit mode

This is so far the coolest one I have seen. Let me try this out and I'll get back to you.