I am interested in finding all known Transcription Factor Binding sites for a list of genes from the ENCODE dataset. How could one automate that? From the tables it appears that each TF has its own table. Thus, I could obtain promoters for my set of genes and find its intersection with the table of the TF in question. But the number of tables is rather large and the names do not follow any convention that I could discern. Is there a way to automate this process?
It's possible that this can be done with table intersections, but personally I would simply download all the files from e g the Yale lab (http://hgdownload.cse.ucsc.edu/goldenPath/hg18/encodeDCC/wgEncodeYaleChIPseq/) and the Hudson Alpha lab (http://hgdownload.cse.ucsc.edu/goldenPath/hg18/encodeDCC/wgEncodeHudsonalphaChipSeq) - these are both for hg18 by the way - and there are other labs as well. Then I would look at the narrowPeak files, which contain predicted TF binding sites based on the ChIP-seq data.
if you go the ucsc test browser, there is more data available. http://genome-test.cse.ucsc.edu/
from there you can go to the UCSC table browser and in the group "regulation" you may find more tracks. Also, if you are working with HG19, then you can get hot spots and new peaks too.
Are you interested in tfbs from some specific lab or in general all the encode tfbs?