Traffic: 178 ip/hr
Question: Generating a Citation Graph from a set of PDFs
 
11
 
 

I have a set of articles in Zotero and pdf format and would like to generate a citation graph from them, ideally with some statistics about how often the articles were cited in total and the possibility to fill in missing links.

Is there any tool that is able to do this?- I've heard that Mendeley is/was capable of extracting references from pdf, but don't know what the status of this feature is. Other suggestions are welcome.

Alternatively, are there tools that just extract references from text, e.g. a collection of regular expressions for different journals? I've got a little experience in Cytoscape plugin development and could code the visualization myself (if anyone is interested in this and would like to help this is also welcome).

Related: this question and Chris Miller on friendfeed, both without a satisfying answer. Maybe even Maltego could be used for this, but I don't know much about the software.

log in to commentrevisions • 3 bookmarks • permalink similar posts • request help via email

4 answers

 
4
 
 

There are tools like cbib for parsing references from PDF files. But the real challenge is not that, it is proper record linkage and de-duplication to identify which records point to the same publication. That is an open research problem with few if any easily applicable tools.

 

Thanks for the cbib link. For the "real challenge", I think PubMed or Google Scholar would already have solved that for me (?)

log in to reply • written 2.1 years ago by Michael Schubert  5,6101517
 

It should help to some extent. I think it will all depend on to the size, quality and diversity of the corpus.

log in to reply • written 2.1 years ago by Istvan Albert ♦♦ 31,07021535
 
 
3
 
 

I don't know of a tool that does it all. So +1 for your question. I would do it as follows:

  1. I would use Hubmed's citation finder http://www.hubmed.org/citation.htm. to extract the references from the pdf.
  2. Subsequently I would use graphviz dot or cytoscape to draw the citation networks by providing the linked pmid's in a text file. Very interesting if this would be possible through a cytoscape plugin. Please mention your cytoscape plugin once it is finished.
 
 
3
 
 

If you have access to Thomson's Web of Science (especially the API) you might be able to use that. Unless you want your results to be public I guess. They do not only know how often publications were cited but also from where. So they must already have collected the information you need.

 
 
0
 
 

Hi Michael, I have the same problem and I want to generate a citation graph for my pdfs, have you found any solution?

 
Log in to add a post