Hello, Biostars. I want to look for over-represented transcription factor binding sites for a given set of genes. I have learned a lot from similar questions in Biostar and papers. But I still have a problem and can't find the precise answer. According to papers, the promoter region contains 5'UTR, INTROS, upstream or downstream of TSS. For the uptream or downstream, I don't know how many bps should I take(the specie is Mouse). In papers, it's different one from another and I didn't find the basic principles to do it. Could you give me some suggestions or give me some reference? Thanks.
If you wanted to be rigorous, you could look at the EST data to define the UTR regions. The true distance will probably vary with different genes.
When doing methylation work in humans, I think the strongest methylation changes correlated with gene expression changes occur within ~1500 bp upstream and ~500 bp downstream.
The simplest solution is probably to use existing tools to address this question. For example, GATHER calculates TRANSFAC enrichment for about a half dozen species (including mouse):
I've also saved a list of TF-enrichment tools that I have found useful: