Question

How Difficult/Reliable Is It To Programmatically (Python) Look Up And Download Papers?

7

Entering edit mode

15.3 years ago

Rvidal ▴ 270

I know that it is possible to write a script that attempts to use the ezproxy that most universities use to download papers directly using some search query. I have seen a perl implementation of this but was looking for something a bit cleaner and hopefully in python.

I don't mind having the script only be able to work within a university network, but it would have to be able to check if the paper is accessible via the current IP or such. Not sure how feasible this is, thus my question...

literature python text • 4.7k views

ADD COMMENT • link updated 20 months ago by Ram 45k • written 15.3 years ago by Rvidal ▴ 270

Ram · Answer 1 · 2010-03-22

6

Entering edit mode

15.3 years ago

Giovanni M Dall'Olio 28k

Have you tried this py-ezproxy?

The script that you have seen in perl may have been written with the Mechanize library. In case, you can look at Mechanize in python, which is the reimplementation in python of the same concept. Anyway, you can use mechanize to connect to the internet using a proxy and do what you are asking for.

ADD COMMENT • link updated 20 months ago by Ram 45k • written 15.3 years ago by Giovanni M Dall'Olio 28k

1

Entering edit mode

Mechanize is a great tool. You could even go an use twill (a library built on Mechanize) to further simplify the usage that you are after: http://pypi.python.org/pypi/twill/0.9

ADD REPLY • link 15.3 years ago by Istvan Albert 102k

0

Entering edit mode

I second twill. Much easier to use than Mechanize for most simple tasks.

ADD REPLY • link 15.2 years ago by Pansapiens ▴ 30

score 3 · Answer 2 · 2010-03-24

3

Entering edit mode

15.3 years ago

Ian Simpson ▴ 960

We have certainly written scripts using Mechanize for this in the past which picked up ~85-90% of articles for which PDFs were available. This trawled around looking for links, forwards etc.. to PDFs.

So you could go that way, but I wonder if you might want to take a look at Pubget http://pubget.com/. I haven't had a close look, but they have an API that you might be able to use to do the hard work for you. As I say I don't know how good the return rate is with this.

ADD COMMENT • link 15.3 years ago by Ian Simpson ▴ 960

0

Entering edit mode

OK looks like it requires the object has a DOI, so you MAY have some issues with older articles.

ADD REPLY • link 15.3 years ago by Ian Simpson ▴ 960

0

Entering edit mode

This could be a limitation on the API as the web interface allows a reasonable proxy to a pubmed search although at present it doesn't support the [tags] search that I really like using for Pubmed searches (i.e. bloggs_j[1AU]). Still it looks pretty useful.

ADD REPLY • link 15.3 years ago by Ian Simpson ▴ 960

0

Entering edit mode

That API actually looks good for a web app I'm working on. Thanks! However, my initial question is more targeted at a python script that would retrieve a given list of papers if a list of say, PubMedIDs are provided. Will look into Mechanize and see how that goes.

ADD REPLY • link 15.3 years ago by Rvidal ▴ 270