PDF is format for graphics or maybe English text. People should not use it for tables or any data that they want to be reused.
For the special case of tables in PDFs, this problem is annoying and common enough (government data) that someone wrote a special converter, just for tables in PDFs: https://github.com/jazzido/tabula. I have never tried it but it includes special code to identify rows, remove headers and pagebreaks, so it really should work better.
Mary: If authors report motifs as graphics in a PDF, then the only motivation I can see is that they don't want their data be used or they forgot to provide it. You should email Matthieu Blanchette and ask for the raw data, which he definitely has. He is most likely aware of the problem. (If he doesn't reply: one of my colleagues works for him.)
As a general-purpose solution, I got very good results with an OCR software like Omnipage or Abbyy. It often produces good XML or at least HTML from PDFs that for some reason fail with pdftotext. You can give the java-based pdfbox a try or python's pyPdf or pdfMiner cited above, there is not a lot of difference between these tools in my hands.
If you want to write something yourself, which I don't recommend, you need one of these pdf-extract libraries. They give you access to each individual character on all pages and allow you to find out fontsize, fonttype, position, etc. Cermine is supposedly a good tool for this, but I haven't tried it, see http://sciencesoft.web.cern.ch/node/120
For anyone working in text mining, PDFs are a time-consuming obstacle but the de facto standard for scientific text. Tools like Papers or Google Scholar's parsers have to use various rules to find out the author names, titles and abstract from a PDF. They go for the biggest font on the first page (title), non-English text underneath (authors), and maybe a single paragraph of intended or bold text then (abstract). Another technique is to look for a DOI that is easily recognizable with a regular expression and then lookup the data in CrossRef.