Question: Pdb Structures Go Terms Clustering And Distance
I have two questions regarding Go terms;

1) I would clusters of PDB proteins using their Go Terms. Are there any available datasets for this?

2) Are there any metrics that can give a distance between proteins based on their Go terms? (similar to the distance we can get between two protein sequences pairs). I need this metric to say how much two proteins are different functionally. thanks in advance.


1) You can derive GO terms for your PDB structures using multiple approaches and use such terms for clustering

  • SCOP domain level annotation using SCOP2GO
  • Transfer of GO terms to individual protein chains using PDB Advanced Search interface
  • Use SIFTS annotations based on PDB chain to uniprot annotations (Use pdb_chain_go.lst)

2) If you have two GO terms you can compute semantic similarity (See this review for a background on basic concepts) between the terms. You can use packages like GoSemSim for such computations, extended list of tools here.

Hi Reyhaneh,

You can download a file of GO annotations (gene association file) for PDB structures from the UniProt-GOA ftp site here;

The file format is described in the associated readme.

I hope this helps.


