I am analyzing CpG dinucleotide patterns in promoter binding consensus sequences for as many TFs as possible. I am trying to use R for my analysis and my approach is to first retrieve known consensus sequences (of TFs such as HRE, CREB etc) and determine whether CpG *dinucleotides* (**NOT** CpG islands) within them are over- or under-represented by either comparing their frequency in the sequence with that of the entire genome or with a randomly generated sequence of similar length. I have been using R packages "MotifDb" and "seqinr" but I couldn't find any package that would do the statistical representation analysis. I have following questions:

(1) Any suggestions for packages? Also what statistical test would be the most accurate for doing the representation analysis? (rho/z-score/other?)

(2) Are there any packages that let you do the same analysis for TFs lacking a known validated sequence? (my approach is to first construct random DNA strings of variable length, but not sure how to go about it?)

