I recently ran interpro on predicted ORFs >100aa. I then used the PFAM_DBD and SUEPRFAMILY_DBD database IDs with the hope of collecting TFs. Of course many genes have both a Homeobox hit (PF00046) as well as another hit such as PAX (PF00292).
My question is, how do people make a prediction for the number of TFs?
I simply took all the TF hits in the list and removed duplicates. Would this be valid for identifying total number of TFs?
But how can I account for specific families?