Cross-referencing multiple databses with PDB
1
0
Entering edit mode
3 months ago
Mariana ▴ 40

Hello!

I have a csv file which comprises a database of hundreds of proteins, with various details about each protein. One of the columns contains the accession codes for the proteins, but the problem is that those codes are not standardized: for some it's the Uniprot id, for others the Genbank id, others the RefSeq id and, for very few, the PDB id. My goal is to get the PDB ids for all proteins in my database, if available.

I know Uniprot has a good online tool for id mapping and EMBL-EBI search could also be useful. Crossing Uniprot with PDB is likely easy (through the rest api, for example) but I would like to know, is there is a way of doing this cross-references for Genbank and Refseq to PDB programmatically? I understand that I might need to cross them with Uniprot first and then PDB, but I would like some suggestions.

Thank you so much in advance!

genbank PDB uniprot embl-ebi refseq • 322 views
ADD COMMENT
0
Entering edit mode

for some it's the Uniprot id, for others the Genbank id, others the RefSeq id and, for very few, the PDB id.

You realize that this will be a challenge. There may not always be 1-to1 relationship between these ID's.

ADD REPLY
0
Entering edit mode
3 months ago
Mensur Dlakic ★ 27k

There is a file named pdb_seqres.txt here that contains sequences of all structurally solved proteins in PDB. It is updated weekly, so fairly current. Assuming that you have sequences for your proteins of interest, you can do a simple BLASTP search against this database. It should tell you if the protein has been solved, or if there are sufficiently close relatives with a known structure.

ADD COMMENT

Login before adding your answer.

Traffic: 1590 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6