Question: Mining Papers On A Desired Topic Based On Certain Criteria
7.2 years ago by
Arun2.3k wrote:

I would like to obtain mainly two things. Suppose that the tag (or topic) I'd like to mine is circadian clock

1) I'd like to find out all keywords (where ever possible.. I guess relatively recent papers alone have keywords?) ( or any other equivalent main words) that are associated with this topic. I am interested in creating a tag cloud or word cloud. I think its a cool opening slide in a presentation. What do you guys think?

2) I'd like to mine for pioneer (major findings / breakthroughs) papers in this field (probably more restriction criterions should apply here, such as, humans or plants etc..), if not, most cited papers, from all available papers. This is basically for literature reading. Basically, how does one get across in finding papers that one should definitely have read??

I know how to generate a word cloud in R. I'd like to know if its possible to extract these information somehow from pubmed (using the tm R-package possibly?).

Thank you in advance for your suggestions, Best, Arun.

7.2 years ago by
Boston, MA USA
Larry_Parnell16k wrote:

An easy solution would be to search for "circadian clock" and capture the top X results (X is a number of your choosing). Then, sort these by the "cited N times" field to find papers such as:

[?]Role of the CLOCK protein in the mammalian circadian mechanism[?][?] N Gekakis, D Staknis, HB Nguyen, FC Davis… - Science, 1998 -[?] [?]Abstract[?] The mouse Clock gene encodes a bHLH-PAS protein that regulates circadian rhythms and is related to transcription factors that act as heterodimers. Potential partners of CLOCK were isolated in a two-hybrid screen, and one, BMAL1, was coexpressed with ...[?] Cited by 915

You could even run a metric like 915 citations / 13 yrs in print = 70 citations/yr.

This seems nice. I have yet to try if I can get the type of statistic and information I'm hoping to get. Thank you!

7.2 years ago by
United States
Zev.Kronenberg11k wrote:

check out F1000

They usually have great insight. In fact just browsing the website would be sufficient.

The two studies to date on the topic actually show that F1000 reviews miss the majority of highly cited papers:

Thanks Zev. It seems to be paid and the studies above doesn't help that point either.

7.2 years ago by
Rochester, NY USA
Alex Paciorkowski3.3k wrote:

A resource like Arrowsmith has sometimes been helpful for broadening/deepening literature mining expeditions. I often find a lot of interesting connections that are not picked up by PubMed alone. I'm not sure they can give you the "most cited" data, but you could send the list through Google Scholar to get that metric, as Larry suggests above.

thanks for this link Alex.

7.2 years ago by
Washington University School of Medicine, St. Louis, USA
Malachi Griffith17k wrote:

How about using LigerCat?

LigerCat: using "MeSH Clouds" from journal, article, or gene citations to facilitate the identification of relevant biomedical literature.

This publication also sounds relevant: A document clustering and ranking system for exploring MEDLINE citations

The LigerCat tag cloud for 'circadian rhythm' is provided below as an example:

alt text

This is so far the coolest one I have seen. Let me try this out and I'll get back to you.

