I am trying to find if there are some regulatory regions in a set of sequences that I am studying. I have seen a lot of tools to perform this, such as Jaspar, Transfac, MEME, or even looking for intersection between the coordinates of my sequences and those of the motifs described in the Genome Browser. But I was wondering if there are some best practice to perform this kind of analysis, as I have not much experience.
About how many regions are we talking? Hundreds, thousands?
About 5 thousand approximately.
Then I would simply perform a motif enrichment analysis, be it with MEME or Homer against the whole genome as background.
Thanks! And what do you think about looking for intersection between the coordinates of my sequences and the coordinates of TFBS described in the Genome Browser?
I do not think this is meaningful. Motifs exist all over the genome simply by random nucleotide co-occurrence. This is why it is so important to use proper statistics and control of false-positives (FDR). In order to check if your regions separate from random motifs you have to perform enrichment analysis. If you run it against the genome you will exclude a lot of standard motifs which are pretty much everywhere. Simply intersection will probably give you an excessive number of motifs, many of them just by change and without any biological function.
But, if I do the intersection with conserved TFBS coordinates then couldn't I assume those "reference" motifs exist in my sequences?