How to download a list pdb codes contained in a csv file using
1
0
Entering edit mode
3.1 years ago
jmungar2 ▴ 10

Hello,

I have a csv file containing a few hundreds of PDB codes. I'm trying to figure out how to use Biopython to download these PDB files using each row of my csv file as input. I've seen that the PDBList module can be used to download PDB files by writing the PDB code on the download_pdb script but I was wondering if there is any way to tell PDBList to get those PDB codes from the csv file.

I got to this and don't know how to progress from here:

from Bio.PDB import PDBList

csv = pd.read_csv("filename.csv", header = None)
csv_df = pd.DataFrame(csv)
csv_df.items()
for row in csv_df.items():
    pdbl.download_pdb_files(row)

Any ideas?

Thank you very much

Juan

Biopython • 1.4k views
ADD COMMENT
0
Entering edit mode

Brilliant! Thanks a lot!

ADD REPLY
1
Entering edit mode
3.1 years ago
Mensur Dlakic ★ 27k

When you say CSV file, do you actually mean a single-column list of PDB ID numbers? If so, this will do the trick without BioPython:

cat filename.csv | xargs -i echo 'wget -q -o /dev/null ftp://ftp.ebi.ac.uk/pub/databases/msd/pdb_uncompressed/pdb{}.ent ; mv pdb{}.ent {}.pdb' > script.sh
source script.sh ; rm script.sh

Make sure you are in a directory where you want all these files to be downloaded. It may take a while and you will not see the progress. If you wish to see it, remove the -q -o /dev/null part from the first line.

ADD COMMENT

Login before adding your answer.

Traffic: 3156 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6