What is the recommended (hopefully free) tool for finding enrichment of transcription factor binding sites in a set of promoter sequences?
A grad student here had this very question at the beginning of her thesis. Like others here, we used TRANSFAC motifs. I would do that again adding JASPAR to the mix. At that time, no tools were known to her. We found two important considerations:
What defines the "promoter" or "gene control region" in human? We settled on 5000 bp of upstream sequence + exon 1 + intron 1 (entire or up to first 1000 bp, can't recall). Why intron 1? Because many gene control elements are found here.
When looking for enrichment, how do you define your set of control genes? By size (given that we took exon 1 and intron 1 data)? By GO categories? By gene position (say the neighboring gene)? This was tough and your solution may be specific to the genes your examining or the question(s) you are after.
The student then ran MAPPER to identify the TRANSFAC motifs.
The PAINT promoter analysis tool is my personal favorite. It will take a list of genes, find the upstream regions automatically, pass them through the free version of TRANSFAC and then compare the enrichment to a background set of genes ... either user provided or from a built-in choice. Everything is quite automated and very customizable.
I would suggest you to may customize your favorite GO enrichment tool in a way that the background list of genes will only represent the TFs or genes with TF related terms and perform the enrichment calculation. I tried this one for a small analysis.
Other option is to use a published method like Modulator inference by network dynamics (MINDy) . Disclaimer: I have not tried MINDy myself.
Maybe RSAT? It seems to have a fairly broad collection of useful motif and CRM building and scanning tools, although I haven't used them myself yet so I can't tell you anything much about them. Web site/services are free but I think you have to register by post(?!) to install tools locally. http://rsat.ulb.ac.be/rsat/
I'd second MEME as a conservative approach. Try to see which patterns are stable over a range of promoter sizes, promoter subsets and cutoffs. Once you have those switch to TOMTOM (part of the MEME suite) to map it to JASPER or TRANSFAC matrices.
That's actually one major difference between the various tools -- Dave, do you have a list of genes or promoter sequences? Many tools expect a list of genes because they have their own concept of what a promoter is. If you have CAGE or RNA-Seq data and would like to define which promoter you are interested in half of the existing systems won't be of use to you. Likewise, if you are working with a species not supported by the system you'd be out of luck.