Question: Grouping protein domains into categories?
I want to compare the domain composition of a few of my cell samples.

After mapping peptide sequences on Pfam, I have thousands of domain hits across my dataset. Comparing across my samples hardly makes sense.

Are there any references / databases for domain categories? Similar to the clans/ superfamilies in Pfam but something broader, e.g. kinases, zinc fingers, ankyrins... ?

To my knowledge, no - I don't think it gets much broader than "Zinc-finger containing protein" if you're specifically interested in the domains themselves.

The next level up would be pathway ontologies through Kegg/GO etc, but its not entirely the same thing.

Thank you!!!
So no way to categorise them other than manually going through the domain names? - Currently they're like zf-C3HC4, zf-C2H2, zf-RING2....

AFAIK, no; but I won’t claim to be an expert here.

If your dataset is reasonably well organised/curated you may be able to get some of the way with assorted regex magic, but that’s rarely the case with annotations.

There are a few resources for organising protein domains:

  • CATH sorts domains by structure into class, architecture, topology and homologous superfamily.
  • ECOD sorts domains by evolutionary relationship into architecture, possible homology, homology, topology and family level.
  • SCOP and its later incarnations, SCOP2 and SCOPe, also sort domains by structure.

I would recommend the first ECOD paper as a good read for how to organise protein domains and how these resources relate to each other.

Hopefully one of those resources will help you make sense of your domains.

Great!! Thank you for these!!

