I have a series of modules, comprised of lists of gene symbols. I want to have a look at transcription factors that may regulate each of these lists of genes, i.e. control the expression of. So far I have identified places where I can get lists of genes per transcription factor like from the ENCODE data, but I have noticed these lists tend to be very long and include 50% of the genome or even more than that. For example, either the GTRD database or this ENCODE list have this problem. I was planning on doing a series of hyper geometric tests to assess the overlap between each TF and my modules list. Can anyone tell me a better way of doing this, or if this is even a sensible approach? Is the quality of the results likely to be reliable. I am working with human diseaseed tissue expression data.