Faster way to download PDBe PISA data
Entering edit mode
7.4 years ago
Louis ▴ 150

I'm downloading PDBe PISA data as advised on the download page, via successive calls with lists of PDB codes such as,4foj,3ga7,4err etc.

The download time seems to be varying (in the last few sets of 50 interfaces it's been between 3 and 5-6 minutes per set of IDs), so for the 2000+ queries it could take between 100 and 200 hours in total!

I'm wondering if it's common practice to parallelise this in some way (e.g. doing a download on one half somewhere else) or if anybody out there may have a full set that they could share?

I would have considered contacting PDBe but their site makes it pretty clear this is the recommended way to download the information.

P.S. the Python script I wrote for the task here allows for interrupted/resumed download if anyone reading this is seeking to do the same.

database Python • 2.6k views
Entering edit mode
25 days ago
Wayne ▴ 510

I've got a more modernized and Pandas-integrated way to do this that I post about here. Snakemake allows you to just provide at a list of PDB identifiers for the script to process. The result will then be Pandas dataframes that can be integrated easily with downstream analysis.


Login before adding your answer.

Traffic: 1635 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6