I am now trying to locate single specific transcription factor binding site to over 100kb sequences of ~1000 genes. But it does not matter how good the binding matrix is and how much I minimize the false positive rate, every matrix has a specific error rate. That's why binding site will be found in every gene in such long sequences. So, I want to find genes enriched in that specific binding site in their regulatory sequence.
Which test should I use and how for such enrichment analysis?
I can calculate the number of hits per gene in test genes and I approximately know the error rate of binding matrix per kb for given cut-off for similarity (given in Transfac database).
Thanks for help.