Question: Generating A Citation Graph From A Set Of Pdfs
gravatar for Michael Schubert
8.3 years ago by
Cambridge, UK
Michael Schubert6.9k wrote:

I have a set of articles in Zotero and pdf format and would like to generate a citation graph from them, ideally with some statistics about how often the articles were cited in total and the possibility to fill in missing links.

Is there any tool that is able to do this?- I've heard that Mendeley is/was capable of extracting references from pdf, but don't know what the status of this feature is. Other suggestions are welcome.

Alternatively, are there tools that just extract references from text, e.g. a collection of regular expressions for different journals? I've got a little experience in Cytoscape plugin development and could code the visualization myself (if anyone is interested in this and would like to help this is also welcome).

Related: this question and Chris Miller on friendfeed, both without a satisfying answer. Maybe even Maltego could be used for this, but I don't know much about the software.

visualization • 4.9k views
ADD COMMENTlink modified 5.8 years ago by Biostar ♦♦ 20 • written 8.3 years ago by Michael Schubert6.9k
gravatar for Istvan Albert
8.3 years ago by
Istvan Albert ♦♦ 80k
University Park, USA
Istvan Albert ♦♦ 80k wrote:

There are tools like cbib for parsing references from PDF files. But the real challenge is not that, it is proper record linkage and de-duplication to identify which records point to the same publication. That is an open research problem with few if any easily applicable tools.

ADD COMMENTlink written 8.3 years ago by Istvan Albert ♦♦ 80k

Thanks for the cbib link. For the "real challenge", I think PubMed or Google Scholar would already have solved that for me (?)

ADD REPLYlink written 8.3 years ago by Michael Schubert6.9k

It should help to some extent. I think it will all depend on to the size, quality and diversity of the corpus.

ADD REPLYlink written 8.3 years ago by Istvan Albert ♦♦ 80k
gravatar for Andra Waagmeester
8.3 years ago by
Maastricht, the Netherlands
Andra Waagmeester3.2k wrote:

I don't know of a tool that does it all. So +1 for your question. I would do it as follows:

  1. I would use Hubmed's citation finder to extract the references from the pdf.
  2. Subsequently I would use graphviz dot or cytoscape to draw the citation networks by providing the linked pmid's in a text file. Very interesting if this would be possible through a cytoscape plugin. Please mention your cytoscape plugin once it is finished.
ADD COMMENTlink written 8.3 years ago by Andra Waagmeester3.2k
gravatar for Chris Evelo
8.3 years ago by
Chris Evelo10.0k
Maastricht, The Netherlands
Chris Evelo10.0k wrote:

If you have access to Thomson's Web of Science (especially the API) you might be able to use that. Unless you want your results to be public I guess. They do not only know how often publications were cited but also from where. So they must already have collected the information you need.

ADD COMMENTlink written 8.3 years ago by Chris Evelo10.0k
gravatar for sahar
6.5 years ago by
sahar0 wrote:

Hi Michael, I have the same problem and I want to generate a citation graph for my pdfs, have you found any solution?

ADD COMMENTlink written 6.5 years ago by sahar0
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1784 users visited in the last hour