Question: Mining Papers On A Desired Topic Based On Certain Criteria
gravatar for Arun
7.6 years ago by
Arun2.3k wrote:

I would like to obtain mainly two things. Suppose that the tag (or topic) I'd like to mine is circadian clock

1) I'd like to find out all keywords (where ever possible.. I guess relatively recent papers alone have keywords?) ( or any other equivalent main words) that are associated with this topic. I am interested in creating a tag cloud or word cloud. I think its a cool opening slide in a presentation. What do you guys think?

2) I'd like to mine for pioneer (major findings / breakthroughs) papers in this field (probably more restriction criterions should apply here, such as, humans or plants etc..), if not, most cited papers, from all available papers. This is basically for literature reading. Basically, how does one get across in finding papers that one should definitely have read??

I know how to generate a word cloud in R. I'd like to know if its possible to extract these information somehow from pubmed (using the tm R-package possibly?).

Thank you in advance for your suggestions, Best, Arun.

data text R pubmed • 1.8k views
ADD COMMENTlink written 7.6 years ago by Arun2.3k
gravatar for Larry_Parnell
7.6 years ago by
Boston, MA USA
Larry_Parnell16k wrote:

An easy solution would be to search for "circadian clock" and capture the top X results (X is a number of your choosing). Then, sort these by the "cited N times" field to find papers such as:

[?]Role of the CLOCK protein in the mammalian circadian mechanism[?][?] N Gekakis, D Staknis, HB Nguyen, FC Davis… - Science, 1998 -[?] [?]Abstract[?] The mouse Clock gene encodes a bHLH-PAS protein that regulates circadian rhythms and is related to transcription factors that act as heterodimers. Potential partners of CLOCK were isolated in a two-hybrid screen, and one, BMAL1, was coexpressed with ...[?] Cited by 915

You could even run a metric like 915 citations / 13 yrs in print = 70 citations/yr.

ADD COMMENTlink written 7.6 years ago by Larry_Parnell16k

This seems nice. I have yet to try if I can get the type of statistic and information I'm hoping to get. Thank you!

ADD REPLYlink written 7.6 years ago by Arun2.3k
gravatar for Zev.Kronenberg
7.6 years ago by
United States
Zev.Kronenberg11k wrote:

check out F1000

They usually have great insight. In fact just browsing the website would be sufficient.

ADD COMMENTlink written 7.6 years ago by Zev.Kronenberg11k

The two studies to date on the topic actually show that F1000 reviews miss the majority of highly cited papers:

ADD REPLYlink written 7.6 years ago by Casey Bergman18k

Thanks Zev. It seems to be paid and the studies above doesn't help that point either.

ADD REPLYlink written 7.6 years ago by Arun2.3k
gravatar for Alex Paciorkowski
7.6 years ago by
Rochester, NY USA
Alex Paciorkowski3.3k wrote:

A resource like Arrowsmith has sometimes been helpful for broadening/deepening literature mining expeditions. I often find a lot of interesting connections that are not picked up by PubMed alone. I'm not sure they can give you the "most cited" data, but you could send the list through Google Scholar to get that metric, as Larry suggests above.

ADD COMMENTlink written 7.6 years ago by Alex Paciorkowski3.3k

thanks for this link Alex.

ADD REPLYlink written 7.6 years ago by Arun2.3k
gravatar for Malachi Griffith
7.6 years ago by
Washington University School of Medicine, St. Louis, USA
Malachi Griffith17k wrote:

How about using LigerCat?

LigerCat: using "MeSH Clouds" from journal, article, or gene citations to facilitate the identification of relevant biomedical literature.

This publication also sounds relevant: A document clustering and ranking system for exploring MEDLINE citations

The LigerCat tag cloud for 'circadian rhythm' is provided below as an example:

alt text

ADD COMMENTlink written 7.6 years ago by Malachi Griffith17k

This is so far the coolest one I have seen. Let me try this out and I'll get back to you.

ADD REPLYlink written 7.6 years ago by Arun2.3k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1229 users visited in the last hour